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Abstract: The amount of computation required to solve many early vision prob- 
lems is prodigious, and so it has long been thought that systems that operate in 
a reasonable amount of time will only become feasible when parallel systems be- 
come available. Such systems now exist in digital form, but most are large and 
expensive. These machines constitute an invaluable test-bed for the development of 
new algorithms, but they can probably not be scaled down rapidly in both physical 
size and cost, despite continued advances in semiconductor technology and machine 
architecture. 

Simple analog networks can perform interesting computations, as has been 
known for a long time. We have reached the point where it is feasible to exper- 
iment with implementation of these idea in VLSI form, particularly if we focus on 
networks composed of locally interconnected passive elements, linear amplifiers, and 
simple nonlinear components. While there have been excursions into the develop- 
ment of ideas in this area since the very beginnings of work on machine vision, much 
work remains to be done. Progress will depend on careful attention to matching of 
the capabilities of simple networks to the needs of early vision. 

Note that this is not at all intended to be anything like a review of the field, 
but merely a collection of some ideas that seem to be interesting. 
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0. Introduction 

The term "parallel networks" in the title may appear to be redundant, since 
the computations at different nodes of an analog network naturally proceed 
in parallel. In several of the examples explored here, however, a number of 
different interacting networks are used, and these do indeed operate "in par- 
allel." We have to try and understand the kinds of computations that simple 
networks can perform and then use them as components in more complex 
systems designed to solve early vision problem. 

Some of the ideas are first developed in continuous form, where we deal, 
for example, with resistive sheets instead of a regular grid of resistors. This 
is because the analysis of the continuous version is often simpler, and lends 
itself to well known mathematical techniques. Some thought must, of course, 
be given to what happens when we approximate this continuous world with 
a discrete one. This typically includes mathematical questions about accu- 
racy and convergence, but also requires that the network be laid out on a 
two-dimensional plane, since today's implementations allow only very limited 
stacking in the third dimension. This can be a problem in the case where the 
network is inherently three-dimensional, or layered, or where several networks 
are used cooperatively. There are four major topics addresses here: 

1. A Gaussian convolver for smoothing that operates continuously in time. 

2. Coupled resistive networks for interpolation. 

3. Moment calculation methods for determining position and orientation. 

4. Systems for recovering motion and shape from time- varying images. 

In the process we touch on several important subtopics, including: 

Feedback methods for solving constrained optimization problems using 

gradient projection, normalization and penalty functions. 

Interlaced arrangements of the cells of the layers of a multi-resolution 

network on a two-dimensional surface. 

Tradeoffs between closed form solutions favored on serial computers and 

iterative or feedback methods better suited for analog networks. 

Laying out time as an extra spatial dimension so as to build a system in 

which information flows continuously. 

An equivalence between two apparently quite differently uses of a resistive 

network. 
Note, by the way, that the four sections of this memo are fairly independent 
and not arranged in any particular order. 



2 Parallel Networks for Machine Vision 

1. A Non-Clocked Gaussian Convolver for Smoothing. 

Gaussian convolution is a useful smoothing operation, often used in early vi- 
sion, particularly in conjunction with discrete operators that estimate deriva- 
tives. There exist several digital hardware implementations, including one 
that exploits the separability of the two-dimensional Gaussian operator into 
the convolution of two one- dimensional Gaussian operators [Larson et al. 81]. 
Analog implementations have also been proposed that use the fact that the 
solution of the heat-equation at a certain time is the convolution of a Gaussian 
kernel with the initial temperature distribution [Knight 83]. 

One novel feature of the scheme described here is that data flows through 
continuously, with output available at any time. Another is an elegant way 
of interlacing the nodes of layers at several resolutions. First comes a brief 
review of why there is interest in Gaussian convolution. 

1.1. Edge Detection 

The detection of step-edge transitions in image brightness involves numeri- 
cal estimation of derivatives. As such it is an ill-posed problem [Poggio & 
Torre 84] [Torre & Poggio 86]. All but the earliest efforts (see, for example, 
[Roberts 65]) employed a certain degree of smoothing before or after applica- 
tion of finite difference operators in order to obtain a more stable estimate. 
Equivalently, they used computational molecules of large support (see, for 
example, [Horn 71]). While most of the early work focused on the image 
brightness gradient, that is, the first partial derivatives of image brightness, 
there where some suggestion that second-order partial derivatives might be 
useful. Rotationally symmetric ones appeared particularly appealing and it 
was noted that the Laplacian is the lowest order linear operator that (almost) 
allows recovery of the image information from the result [Horn 72, 74]. 

It was also clear early on that smoothing filters should be weighted so as 
to put less emphasis on points further away than those nearby 1 . The Gaussian 



There was, however, intense disagreement about whether the composite edge 
operator should have a sharp transition in the middle or not. Some argued that 
the transition should be rapid, since a matched filter has an impulse response 
equal to the signal being detected, which in this case was assumed to be an 
ideal step transition. Others claimed that the aim was to suppress higher spatial 
frequencies to improve the signal to noise ratio. This latter argument took into 
account the fact that the signal drops off at higher frequencies while the noise 
spectrum tends to be fairly flat. The view of the edge operator as a composition 
of a smoothing filter and a finite difference approximation of a derivative finally 
reinforced the latter view. 
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was popular for smoothing because of a number of its mathematical proper- 
ties, including the fact that the two-dimensional Gaussian can be viewed as 
the product of two one- dimensional Gaussians, and, much more importantly, 
as the convolution of two one- dimensional Gaussians [Horn 72]. This gave 
rise to the hope that it might be computed with reasonable efficiency, an 
important matter when one is dealing with an image containing hundreds of 
thousands of picture cells. Note that the Gaussian is the only function that 
is both rotationally symmetric and separable in this fashion [Horn 72]. The 
separability property, which was the original impetus for choosing the Gaus- 
sian as a smoothing filter, was forgotten at times when proposals where made 
later to build hardware convolvers (but, see [Larson et al. 81]). 

Multi-Resolution Techniques 

There are other reasons for smoothing a discretized image, including sup- 
pression of higher spatial frequency components before sub-sampling. Sub- 
sampling of an image produces an image of lower resolution, one that contains 
fewer picture cells. Ideally, one would hope that this smaller image retains 
all of the information in the original higher resolution image, but this is, 
of course, in general not possible. The original image can be reconstructed 
only if it happens not to contain spatial frequency components that are too 
high to be represented in the sub-sampled version. This suggests suppressing 
higher frequency components before sub-sampling in order to avoid aliasing 
phenomena. An ideal low-pass filter should be used for this purpose 2 . While 
the Gaussian filter is a poor approximation to a low pass filter, it has the 
advantage that it does not have any over- or undershoot in either the spatial 
or the frequency domain. Consequently, the Gaussian smoothing operator has 
been used in several multi-scale schemes, despite the fact that it is not a good 
approximation to a low-pass filter. 

The difference of two spatially displaced Gaussians was used quite early 
on in edge detection [MacLeod 70a, 70b]. The idea of working at multiple 
scales occurred around about this time also ([Rosenfeld Sz Thurston 71, 72] 
and [Rosenfeld, Thurston k Lee 72]). An elegant theory of edge detection 
using zero- crossings of the Laplacian of the Gaussian at multiple scales was 
developed by Marr and Hildreth ([Marr k Hildreth 80] and [Hildreth 80, 83]). 
This reversed an earlier suggestion that a directional operator may be optimal 
[Marr 76]. 

Since then, it has been shown that the rotationally symmetric operators 



For an excellent finite support approximation to a low-pass filter look in [Rifman 
k McKinnon 74] [Bernstein 76] [Abdou k Wong 82]. 
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do have some drawbacks, including greater inaccuracy in edge location when 
the edge is not straight, as well as higher sensitivity to noise than directional 
operators (see, for example, [Berzins 84] and [Horn 86]). Operators for es- 
timating the second derivative in the direction of the largest first derivative 
(the so-called second directional derivative) have been proposed by Haral- 
ick [Haralick 84] (see also [Haralick 80] [Hartley 85] [Horn 86] ) 3 . Recently, 
Canny developed an operator that is optimal (in a sense he defines) in a one- 
dimensional version of the edge detection problem [Canny 83] . His operator is 
similar, but not equal to, the first derivative of a Gaussian. A straightforward 
(although ad hoc) extension of this operator to two-dimensions has recently 
become popular. 

If we view the problem as one of estimating the derivatives of a noisy sig- 
nal, we can apply Wiener's optimal filtering methods [Wiener 66] [Anderson 
h Moore 79]. Additive white noise is uncorrelated and so has a flat spectrum, 
while images typically have spectra that decrease as some power of frequency, 
starting form a low-frequency plateau [Ahuja & Schachter 83]. The magni- 
tude of the optimal filter response ends up being linear in frequency at low 
frequencies, then peaks and drops off as some power of frequency at higher 
frequencies. Under reasonable assumptions about the spectra of the ensemble 
of images being considered, this response may be considered to match (very 
roughly) the transofmr of the derivative of a Gaussian. 

The above suggests that while there is nothing really magical about the 
Gaussian smoothing filter, it has been widely accepted and has many desirable 
mathematical properties (although only a few of these were discussed here). 
It is thus of interest to find out whether convolutions with Gaussian kernels 
can be computed directly by simple analog networks. It is also desirable to 
find out whether the Laplacian of the convolution with a Gaussian, or the 
directional derivatives, can be computed directly. 

1.2. Binomial Filters 

In practice, we usually have to discretize and truncate the signal, as well as the 
filters we apply to it. If we sample and truncate a Gaussian, it loses virtually 
all of the interesting mathematical properties discussed above. In particular, 
truncation introduces discontinuities that assure that the transform of the 
filter will fall off only as the inverse of frequency at high frequencies, not 
nearly as fast as the transform of the Gaussian itself. Furthermore, while the 
transfer function of a suitable scaled Gaussian lies between zero and one for 



While the second directional derivative is a non-linear operator, it is coordinate- 
system independent, as is the Laplacian operator. 
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all frequencies, the transfer function of a truncated version will lie outside this 
range for some frequencies. These effects are small only when we truncate at 
a distance that is large compared to the spatial scale of the Gaussian. 

In addition, when we sample, we introduce aliasing effects, since the 
Gaussian is not a low-pass waveform. The aliasing effects are small only 
when we sample frequently in relation to the spatial scale of the Gaussian. 
It makes little sense to talk about convolution with a "discrete Gaussian" 
obtained by sampling with spacing comparable to the spatial scale, and by 
truncating at a distance comparable to the spatial scale of the underlying 
Gaussian. The resulting filter weights could have been obtained by sampling 
and truncating many other functions and so it is not reasonable to ascribe 
any of the interesting qualities of the Gaussian to such a set of weights. 

Instead, we note that the appropriate discrete analog of the Gaussian is 
the binomial filter, obtained by dividing the binomial coefficients of order n 
by 2 n so that they conveniently sum to one. Convolution of the binomial filter 
of order n with the binomial filter of order m yields the binomial filter of order 
(n + m), as can be seen by noting that multiplication of polynomials corre- 
sponds to convolution of their coefficients. The simplest binomial smoothing 
filter has the weights: 

"1 1 
2'2 

Higher order filters can be obtained by repeated convolution of this filter with 
itself: 

I ? I\ ,sa /I I\ _ /I 5 5 I 

4'4'4J l2'2j """ \8'8'8'8 
The transform of the binomial filter of order n is simply 

cos n u;/2, 

since the transform of the simple filter with two weights is just cosu;/2. This 
shows that the magnitude of the transform is never larger than one for any 
frequency, a property shared with a properly scaled Gaussian. Such a filter 
cannot amplify any frequency components, only attenuate them. 

1.3. Analog Implementation of Binomial Filters 

Binomial filters can be conveniently constructed using charge coupled device 
technology [Sage 84] [Sage & Lattes 87]. It is also possible to use potential 
dividers to perform the required averaging. Consider, for example, a uniform 
one- dimensional chain of resistors with inputs applied as potentials on even 
nodes and results read out as potentials on odd nodes. The potentials on 
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the odd nodes clearly are just averages of the potentials at neighboring even 
nodes 4 . 

One such resistive chain can be used to perform convolution with the 
simple two-weight binomial filter. To obtain convolution with higher-order 
binomial filters, we can reuse the same network, with inputs and outputs 
interchanged, provide that we have clocked sample- and- hold circuits attached 
to each node. At any particular time one half of the sample-and-hold circuits 
are presenting their potentials to the nodes they are attached to, while the 
other half are sampling the potentials on the remaining nodes. 

But we are here more interested in non-clocked circuits, where outputs 
are available continuously. The outputs of one resistive chain can be applied as 
input to another, provided that buffer amplifiers are interposed to prevent the 
second chain from loading the first one. We can cascade many such resistive 
chain devices to obtain convolutions with binomial filters of arbitrary order. 

It is possible to extend this idea to two dimensions. Consider nodes on 
a square grid, with each node connected to its four edge-adjacent neighbours 
by a resistor. Imagine coloring the nodes red and black, like the squares on 
a checker-board. Then the red nodes may be considered the inputs, where 
potentials are applied, while the black nodes are the outputs, where potentials 
are read out. Each output potential is the average of four input potentials, 
and each input potential contributes to four outputs. 

Unfortunately, the spatial scale of the binomial filter grows only with 
the square root of the number of stages used. Thus, while a lot of smoothing 
happens in the first few stages, it takes many more stages later in the sequence 
to obtain significantly additional smoothing. Also, the smoothed data has 
lost some of its high frequency content and so can perhaps be represented by 
fewer samples. These considerations suggest a multi-scale approach, where 
the number of nodes decreases from layer to layer. Averaging of neighbours 
at a later layer involves connections between nodes corresponding to points 
that are far apart in the original layer. Thus the smoothing that results in 
one of the later layers is over a larger spatial scale. We discuss later how to 
efficiently interlace the nodes of several such layers of different resolution on 
a two-dimensional surface. But first we will approach this smoothing method 
from the point of view of the equivalent continuous system. For this reason, 
we next review some properties of resistive sheets. 



The outputs in this case are offset by one half of the pixel spacing from the 
inputs, but this is not a real problem. In particular, an even number of such 
filtering stages produces results that are alogned with the original data. 
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1.4. Resistive Sheets 

Continuous resistive sheets solve Poisson's equation 

A«(x) = - j oi(x), 
where the output u(x) is the potential on the sheet at the point x, while the 
input z(x) is the current density injected and p is the resistivity "per square." 
Viewed this way, one can think of the solution of this second-order partial 
differential equation for u(x), given z(x), as application of the "inverse of the 
Laplacian operator." In image processing, we are usually concerned with the 
two-dimensional case 

Uxx(x,y) + u yy (x,y) = -pi(x,y). 
Discrete arrangements of resistors can be designed to solve difference approxi- 
mations of Poisson's equation. These resistive networks are remarkably robust 
against small changes in individual resistances and even errors in interconnec- 
tion. This has all been known for a very long time, used in analog computer 
simulations of steady state heat flow, for example, and has even been exploited 
in machine vision (see, for example, [Horn 74]). 

Discrete approximations of Poisson's equation can also be solved by a 
simple network of operational amplifiers and resistors [Horn 74]. The Lapla- 
cian of a function can be considered to be the limit of the convolution of the 
function with a rotationally symmetric center-surround operator of local sup- 
port, as the scale of this operator shrinks to zero. We can, for example, think 
of application of the Laplacian as the limit of convolution with 

r-2/7re 4 , for0<x 2 + y 2 < e 2 ; 
L t (x,y) = I +2/37re 4 , for e 2 < x 2 + y 2 < 4e 2 ; , 

I 0, for 4e 2 < x 2 + y 2 . 

as e tends to zero [Horn 86]. This view of the Laplacian is implicit in the usual 
discrete approximations of the Laplacian. A resistive network with operational 
amplifiers can compute the inverse of convolution with such a center-surround 
operator. Such networks actually can be used to easily invert any convolu- 
tion with local support, and so are more general than resistive sheets. They 
can accommodate spatially varying linear operators also, since the feedback 
arrangement merely has to mimic the forward operation [Horn 74]. 

Note that the inverse operation can, of course, not recover any spatial 
frequency components removed by the forward operation. In this case bound- 
ary conditions are needed to arrive at a unique stable solution of the inverse 
problem. Similarly, in the presence of noise, spatial frequency components 
that are strongly attenuated by the forward operation will be recovered in- 
accurately by the inverse operation. The inverses of local operations are in 
general global, but they have a special structure that makes it possible to 
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compute them using local feedback methods, as described. They are at times 
called quasi-local or pseudo- local for this reason. 

1.5. Solving the Heat Equation 

By adding capacitance, and removing the input, a resistive sheet can be used 
to solve the heat equation 

Au(x) = Ku t (x), 
which in two dimensions reads, 

u xx (x, t/, t) + u yy (x, y, t) = K u t (x,y, t), 
where k is the product of the resistivity (resistance "per square") and the 
capacitance per unit area. The steady state of such a sheet is given by the 
solution of Laplace's equation, since u t = in the steady state. 

If the potential at time t = is forced to equal some given input function, 
U(x,y) say, we obtain, at a later time, the convolution of the input function 
with some Gaussian kernel [Courant k Hilbert 62]. The result can be written 
in the form 



u 



(x, y ,t) = J^mi)^- *-^'-* #*,, 



as can easily be verified by taking the required partial derivatives, and noting 
that the Gaussian 

1 _« 2 +y 2 

e 2<r2 

27R7 2 

becomes the unit impulse function as a tends to zero. Here the standard 
deviation of the Gaussian is given by 

<7 = y/2nt. 
Convolutions with Gaussians of different widths can be obtained by waiting 
varying amounts of time. This is the observation exploited by Thomas Knight 
in his Gaussian convolver chip design [Knight 83]. Note that the spatial scale 
(<j) of the Gaussian only grows with the square root of time. 

In an edge detector, one is actually looking for derivatives of the convolved 
output, or combinations of derivatives, such as, for example, the Laplacian 5 . 
In the scheme above, the partial derivatives have to be computed in a separate 
step. Since differentiation is a linear shift-invariant operation, it commutes 
with the Gaussian convolution step. Consequently these operations can be 

The Laplacian of the Gaussian can be approximated by the difference of two 
Gaussians of different widths. This, however, is a good approximation only when 
the two Gaussians have almost exactly the same width. The results of convolu- 
tions with two Gaussians of almost the same width will be very similar, and so 
numerical problems arise when the results are subtracted. 
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performed in either order. A subtlety often overlooked is that the result of 
convolution with a Gaussian is likely to be very smooth and so differences 
of quantities that are nearly equal have to be taken when derivatives of the 
result are to be found. If the intermediate result is quantized (or noise added), 
the differenced output may in extreme cases be almost totally worthless. It 
is better then to apply the discrete difference operator first, and then smooth 
the result. This approach, however, increases the amount of computation 
required when more than one directional derivative is to be estimated, since 
the Gaussian convolution has to be repeated for each derivative. 



1.6. Non-clocked Approach 

The approach given in the previous section for obtaining the convolution of 
an image with a Gaussian requires that the input be loaded into the network 
at a certain time, the input then disconnected and the output read a fixed 
time later. It may be more convenient to have a non-clocked system, where 
the input is applied continuously to one end of a three-dimensional, layered 
resistive network, while the output is available continuously at the other end. 
If Gaussians of differing width are needed, these can be read out from inter- 
mediate layers of the network. Ideally, it should also be possible to directly 
read out the Laplacian of the Gaussian at any stage. 

These objectives can be met by a system composed of several layers of 
resistive sheets, where time is in essence laid out as another spatial dimension. 
Each sheet solves Poisson's equation with an input current density that is 
forced to be proportional to the difference in potential at corresponding points 
in successive sheets. Successive layers correspond to different times in the 
solution of the heat equation by the sheet mentioned earlier, so the potential 
difference between layers, u^ n+1 \x, y) - u( n \x,y), is a discrete approximation 
of (a multiple of) u t , the time derivative. 

Now a resistor draws a current that is proportional to the difference of 
potentials on its ends. So resistors connected between corresponding nodes 
on two successive layers would appear to inject the desired currents into the 
second layer. But we cannot just interconnect adjacent sheets using a layer 
of resistors, since the current they inject into one sheet is extracted from the 
other sheet. This disturbs the solution on the earlier sheet. In fact, such a 
network solves Laplace's equation in three dimensions, rather than the heat 
equation in two dimensions. The result in this case can be written in the form 
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as can easily be verified by taking the required partial derivatives, and noting 
that 

__1 1 

becomes the unit impulse function as z tends to zero. This is not what we 
want. For one thing, this smoothing function falls off much more slowly with 
radial distance than the Gaussian. 

All that we need to make the basic idea work is to add a layer of buffer 
amplifiers that copy the potential of one sheet without drawing any current 
from it. The outputs of these amplifiers are then connected to the next sheet 
by means of suitably chosen resistors (equivalently, we can use a differential 
transconductance amplifier with high input impedance, that produces a cur- 
rent at its output proportional to the potential difference between input and 
output [Mead 89]). 

1.7. Convolution with the Laplacian of the Gaussian 

If we wish to extract the Laplacian of the Gaussian, we can read out the 
currents in the resistors (or the buffer amplifiers) connecting successive layers. 
The reason is that these correspond to the time derivative u t in the heat 
equation and hence are proportional to the Laplacian, since 

u xx (x, y, t) + u yy (x, y,t) = K u t (x, y, t). 
By reading out the currents in different interconnecting layers, we can obtain 
convolutions with the Laplacians of Gaussians of different widths. Usually we 
are only interested in the zero-crossings of the result, so we may not need to 
extract all of these measurements. Instead, neighborhoods are located where 
some currents are positive while others are negative. 

Actually, interpolation can be used to recover the location of the edge 
fragments to considerably better accuracy than the spacing between nodes 
in the network of resistors. In digital simulations one finds almost an order 
of magnitude improvement in resolution when the signal is reasonably free 
of noise. There is then a tradeoff between a network with very many simple 
nodes versus a network with fewer complex nodes capable of supporting the 
interpolation process. This may be an issue of considerable importance if there 
is a limit on the total number of nodes that can be conveniently constructed 
using a particular fabrication technology. 

Effects of Discretization 

In practice we typically have to discretize the continuous analog system that 
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computes the desired convolution. For a start, we are now using discrete layers 
in the ^-direction (which was the time direction before). This means that the 
convolution we obtain is actually not with a Gaussian, but the zero-th order 
modified Bessel function, K (r). The result can be written in the form 

u^ix, y) = fJ D „(»)({, n) g K Q (ym/(* - 2 + (y - >?) 2 ) <*e ^ 

where g is the conductance per unit area of the material connecting the 
(buffered) output of the n-th layer to the (n + 1) layer. The zero-th or- 
der modified Bessel function is not a particularly good approximation to the 
Gaussian— for one thing, it has a singularity at the origin 6 . Fortunately, if we 
repeat this convolutional operation many times we obtain an effective over- 
all response that is close to Gaussian, as a consequence of the central limit 
theorem 7 . 

The convolutional operator changes once again when the continuous re- 
sistive sheet is replaced by a regular discrete grid of resistors. No closed form 
solution is known in this case, for either square or hexagonal tesselations, 
although the response can be estimated readily using numerical techniques. 

It is also interesting to compare this scheme, derived by mapping the 
time dimension into a third spatial dimension and then discretizing, with the 
binomial filter scheme discussed earlier. One of the differences between the 
two schemes is that there are no resistors between layers in the binomial filter 
scheme, connections are made directly to the outputs of the buffer amplifiers 
from the previous layer. Another difference is that the output nodes are 
distinct from the input nodes in the binomial filter scheme, whereas all nodes 
act as both inputs and outputs in the scheme discussed here. In other respects 
the two methods are similar. 



1.8. Multiple Scales 

The information is smoothed out more and more as it flows through the layers 
of such as system. Consequently we do not need to preserve full resolution in 
layers further from the input. Very roughly speaking, the information is low- 
pass filtered and so fewer samples are required to represent it. This suggests 
that successive sheets could contain fewer and fewer nodes. 

Note also that it would be difficult indeed to superimpose, in two di- 
mensions, multiple layers of the three dimensional network described above, 

6 Ko(r) » — logr for small r. 
Another possibility is to use networks that solve the discrete analog of the bi- 
harmonic equation. In this case the resulting convolutional kernel is a better 
approximation to the Gaussian — for one thing, the kernel does not have a singu- 
larity at the origin [Poggio et al. 85] [Harris 89]. 
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if each of them contained the same (large) number of nodes. Now if, instead, 
a particular layer contains only l/k times as many nodes as the previous layer 
then the total number of nodes is less than 

k 

k-1 
times the number of nodes in the first layer. If, for example, we reduce the 
number of nodes by one half each time, then a network containing a finite 
number of layers has less than twice the number of nodes that the first layer 
requires. (If we reduce the number of nodes to a quarter each time, then the 
whole network has less than 4/3 times as many as the first layer.) 

1.9. Growth of Standard Deviation with Number of Layers 

Another argument for sub-sampling is that, if all the layers and the intercon- 
nections are the same, then the width of the Gaussian grows only with the 
square root of the number of layers, as can be seen from the form of the explicit 
solution of the heat equation given earlier. This suggests that arrangements 
be made to ensure that k varies from layer to layer. We can increase k either 
by decreasing the resistances in the sheets themselves, or by decreasing the 
conductance in the interconnecting layers. But note that if successive layers 
contain fewer nodes, while the resistances between nodes are kept the same, 
then k in effect is increased automatically. This can be exploited to attain 
exponential growth of the effective width of the Gaussian with the number of 
layers. 

In the case of a square grid of nodes, a simple scheme would involve 
connecting only one cell out of four in a given layer to the next layer. This 
corresponds to a simple sub-sampling scheme. Sampling, however, should 
always be preceded by low-pass filtering (or at least some sort of smoothing) 
to limit aliasing. A better approach therefore involves first computing the 
average of four nodes in a given 2x2 pattern in order to obtain a smoothed 
result for the next layer 8 . Each cell in the earlier layer contributes to only 
one of the averages being computed in this scheme. 

The average could be computed directly using four resistors, but these 
would load down the network. The average can be computed instead using 
resistors connected to buffer amplifiers. Each cell in the earlier layer feeds 
a buffer amplifier and the output of the amplifier is applied to one end of a 
resistor. The other ends are tied together in group of four and connected to 

Naturally, since this is not an ideal low-pass filter, some aliasing effects cannot 
be avoided. In fact, the resulting transfer function goes through zero not at the 
Nyquist frequency, but only at twice that frequency, but this is much better than 
not doing any smoothing at all. 
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the nodes in the next layer. Note that the nodes of the latter sheet should be 
thought of as corresponding to image locations between those of the earlier 
sheet, rather than lying on top of a subset of these earlier nodes. But this 
subtlety does not present any real problems. 



1.10. Layout of Interlaced Nodes 

A four-to-one reduction in number of nodes is easy to visualize and leads to 
rapid reduction in the number of nodes in successive layers, but it does not 
yield a very satisfactory discrete approximation to the original continuous do- 
main equation. A better approximation can be attained if the number of nodes 
is reduced only by a factor of two. Note that in this case the total number of 
nodes in any finite number of layers is still less than twice the number of nodes 
in the first layer. An elegant way of achieving the reduction using a square 
grid of nodes is to think of successive layers as scaled spatially by a factor of 
y/2 and also rotated 45° with respect to one another. Once again, each of the 
new nodes is fed a current proportional to the difference between the average 
potential on four nodes in the earlier layer and the potential of the node itself. 
This time, however, each of the earlier nodes contribute to two of these aver- 
ages rather than just one, as in the simple scheme described in the previous 
section. A node receives contributions from four nodes that are neighbors of 
its ancestor node in the earlier layer, but it receives no contribution directly 
from that ancestor. 

An elegant partitioning of a square tessellation into sub-fields may be 
used in the implementation of this scheme in order to develop a satisfactory 
physical layout of the interlaced nodes of successive layers of this network 
(Robert Floyd drew my attention to this partitioning in the context of paral- 
lel schemes for producing pseudo grey-level displays on binary image output 
devices [Floyd 87]). This leads to the interlaced pattern shown in Figure 1, 
where each cell is labelled with a number indicating what layer it belongs to. 

This scheme leads to an arrangement where the nodes of the first layer 
are thought of as the black cells in a checkerboard. The white cells form a 
diagonal pattern with \/2 times the spacing of the underlying grid. We can 
now consider this new grid as a checkerboard, turned 45° with respect to the 
first. The black cells in this checkerboard belong to the second layer. The 
remaining white cells form a square grid aligned with the underlying grid but 
with twice the spacing between nodes. Considering this as a checkerboard in 
turn, we let the black cells be the nodes of the third layer, an so on . . . 

Note that one half of the cells are labelled 1, one quarter are labelled 2, 
one eigth are labelled 3 and so on. The top left node, labelled 0, does not 
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Figure 1: A way to interlace nodes of several layers of a multi-scale net- 
work so they can be laid out on a two-dimensional surface. The network 
containing nodes labelled (n + 1) has half as many nodes as the network 
whose nodes are labelled n. The total number of nodes is less than twice 
the number of nodes in the finest layer. 



belong to any of the partitions. If we consider the nodes labelled with their 
row number i and there column number j, both starting at zero at the top 
left node, we find that a node belongs to layer k if the binary representation 
of i 2 + j 2 has k - 1 trailing zeros! 
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2. Coupled Poisson's Equation for Interpolation 

Uniform resistive networks that solve Poisson's and Laplace's equations have 
many other applications. One is in interpolation, where data may be provided 
on just a few contours, as happens in the edge matching approach to binoc- 
ular stereo [Grimson 81] 9 . Many modern interpolation methods are based on 
physical models of deformation of elastic sheets or thin plates. So these are 
briefly reviewed here first. 

2.1. Mathematical Physics 

An elastic membrane takes on a shape that minimizes the stored elastic energy. 
In two dimensions the stored energy is proportional to the change in area of 
the membrane from its undisturbed shape, which we assume here is flat. The 
area is given by 

JJ y/l + *l + *}dxdy, 

where z(x,y) is the height of the membrane above some reference plane. If 
the slope components z x and z y are small, 

Thus the membrane minimizes 

// {zl+z 2 y )dxdy, 

provided that the partial derivatives z x and z y are small. A unique minimum 
exists if the sheet is constrained to pass through a simple closed curve dD on 
which the height is specified. The Euler equation for this calculus of variation 
problem yields 

z xx + z yy = or Az = 0, 
except on the boundary where the height z(x,y) is specified [Courant & 
Hilbert 53]. 

The problem of interpolation is harder if data is given only on a sparse set of 
points, as opposed to contours. Consider, for example, Laplace's equation with 
some constant value specified on a simple closed curve with a different value given 
at a single point inside the curve. The solution minimizes the integral of the sum 
of squares of the first partial derivatives. It turns out that this is not a well-posed * 
problem, since there is not a unique solution. One of the "functions" that mini- 
mizes the integral takes on the value specified on the boundary everywhere except 
at the one point inside where a different value is given. Clearly no "interpola- 
tion" is occurring here. This problem is not widely discussed, in part because the 
discrete approximation does not share this pathological behaviour [Grzywacs & 
Yuille 87]. 
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2.2. Interpolation from Sparse Contours and Isolated Points 

The above equation has been proposed as a means of interpolating from sparse 
data specified along smooth curves, not necessarily simple closed contours. 
We explored the use of this idea, for example, in generating digital terrain 
models from contour maps in our work on automated hill-shading ([Strat 77] 
and [Horn 79, 81, 83]) as well as in remote sensing ([Bachmann 77], [Horn 
& Bachmann 78] and [Sjoberg & Horn 83]). Some undergraduate research 
opportunities project work was based on this idea [Mahoney 80], as were the 
bachelor's theses of [Goldfinger 83], and [Norton 83]. Recently, a 48 x 48 
cell analog chip has been built to do this kind of interpolation [Luo, Koch & 
Mead 88]. 

The result of elastic membrane interpolation is not smooth, however, 
since, while height in the result is a continuous function of the independent 
variables, slope is not. Slope discontinuities occur all along contour lines, and 
the tops of hills and bottoms of pits are flat 10 . 

This is why we decided to use thin plates for interpolation from contour 
data instead. The potential energy density of a thin plate is 

, ( 1 1 \ 2B 

\Pl p\) P1P2 

where A and B are constants determined by the material of the plate, while 
pi and p 2 are the principal radii of curvature of the deformed plate [Courant 
k Hilbert 53]. Again, assuming that the slopes z x and z y are small, we can 
use the approximations 



1 1 



ZxxZiiu Z. 



Pi + P2 * ( Z * X + Zy ^ and ~^2 ~ * X **" ~ ' Xy ' 

This allows one to approximate the potential energy of the deformed plate by 
a multiple of 



//. 



((*** + z yy f - 2(1 - fi)(z xs z„ - z\ )) dx dy, 

where /i = B/A. If the material constant // happens to equal one, this sim- 
plifies to the integral of the square of the Laplacian: 



//. 



( Az) 2 dx dy. 

'D 

The Euler equations for this variational problem lead to the bi-harmonic equa- 
tion 

A(Az) = 0, 



Discontinuities in slope are not a problem for many applications of interpolated 
depth or range data. Shaded views of the surfaces, however, clearly show the 
discontinuities, since shading depends on surface orientation. 
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except where the plate is constrained. This fourth-order partial differential 
equation has a unique solution when the height z(x,y), as well as the normal 
derivative of z(x,y) are specified on a simple closed boundary 3D. 

It turns out that the same Euler equation applies when the material 
constant // is not equal to one, because (z xx z y y-z 2 xy ) is a divergence expression 
[Courant & Hilbert 53]. Solution of the bi-harmonic equation, while involving 
considerably more work than Laplace's equation, produces excellent results in 
interpolation from contours. Iterative methods for solving these equations are 
available (see for example [Horn 86]). Some obvious implementations may not 
be stable, particularly when updates are executed in parallel, so care has to be 
taken to ensure convergence. The problem is that computational molecules or 
stencils with negative weights are needed, and these can amplify errors with 
some spatial frequencies rather than attenuate them. (The corresponding 
system of linear equations is not diagonally dominant.) This issue is not 
pursued any further here. The proper way of dealing with boundary conditions 
is also not discussed here, for details, see the cited references. 

The same methods where used in interpolation of surface depth from 
stereo data along brightness edges [Grimson 81, 82, 83]. Grimson observed 
that the null-space of the quadratic variation (z\ x + 2z\ y + z 2 yy ) is smaller than 
that of the squared Laplacian (Az) 2 , and so decided to use the quadratic 
variation as the basis for his binocular stereo interpolation scheme. This 
corresponds to choosing // = 0. Note that this affects only the treatment of 
the boundary; one still solves the bi-harmonic equation inside the boundary. 

The methods discussed here rapidly get rid of high spatial frequency 
components of the error, but may take many iterations to reduce the low 
frequency components. The number of iterations required grows quadratically 
with the width of the largest gap between contours on which data is available. 
Efficient multiresolution algorithms were developed to speed up the iterative 
computation of a solution [Terzopolous 83]. Terzopolous also applied these 
ideas to variational problems other than interpolation [Terzopolous 84]. 

2.3. Resistive Networks for the Bi-Harmonic Equation 

It is clear that methods for solving the bi-harmonic equations are important in 
machine vision. Unfortunately, simple networks of (positive) resistances can 
not be constructed to solve discrete approximations of this equation. Compu- 
tational molecules or stencils [Horn 86] for the bi-harmonic operator involve 
negative weights and connections to nodes two steps away. 

It is of interest then to discover ways of using methods for solving Pois- 



'//. 
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son's equation 

Az(x,y) = f{x,y) 
in the solution of the bi-harmonic equation, since simple resistive networks 
can be constructed to solve Laplace's equation. One simple idea is to use the 
coupled system, 

Az(x, y) = u(x, y) and Au(x, y) = f(x, y), 
since here 

A(Az) = Au = f(x,y). 
The constraints on z(x,y) can be handled easily in this formulation, but con- 
straints on the partial derivatives of z(x,y) are harder to incorporate. This 
idea will not be pursued further here. 

An alternative explored recently by Harris [Harris 86] involves minimiza- 
tion of the functional 

((** " P? + (z y - qf + \(p 2 x + p\ + q 2 x + q 2 y )) dx dy. 

The Euler equations for this calculus of variation problem yield 

Az = p x + q y , 

\Ap = p- z x , 

\Aq = q-z y . 
In this scheme, three coupled Poisson's equations are used, each of which can 
be solved using a resistive network. Constraints on both z(x,y) as well as z x 
and z y can be incorporated. 

The relationship to the problem of solving the bi-harmonic equation can 
be seen by expanding 

A{Az) = A( Px + q y ), 
and noting that differentiation and application of the Laplacian are linear 
operations, so that they can be interchanged: 

A( Px ) = (Ap) x = (1/A)Q> - z t ) x = (l/A)( Pl - z xx ) 
A(q y ) = (Aq) y = (1/A)(« - *, ), = (1/A)fo, - z„) 
and finally 

A(Az) = (1/A)((p, + q y ) ~ (z xx + z yy )) = 0, 
since Az = (p x + q y ). (Note that this does not necessarily imply that p = z x 
and q = z y .) 

This scheme is reminiscent of the one developed by Horn for recovering 
depth z(x, y) y given dense estimates of the components p and q of the gradient 
of the surface (as used in [Ikeuchi 84], and described in [Horn k Brooks 86]). 
There one minimizes 



// 



(zx ~P) 2 + (z y -q) 2 dxdy, 

D 
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for which the Euler equation yields 

Az=p x + q y , 

where p and q are here the given estimates of the components of the surface 
gradient. In Harris's scheme we do not have these estimates at all points, 
instead we are given z at some points, and some linear combination of p and 
q at some other points. 

2.4. Application to Shape from Shading 

Solution of the shape from shading problem, in the special case when the light 
sources and the viewer are far away in relation to the size of the object being 
viewed, revolves around the nonlinear first-order partial differential equation 

E(x,y) = R(z x (x,y), z y (x, y)) , 

the so-called image irradiance equation [Horn 86]. Solutions may be obtained 
by the method of characteristic strip expansion [Garabedian 64]. The results 
can be improved considerably by solving characteristics in parallel so as to 
permit application of a sharpening method to the propagating solution wave- 
front [Horn 70]. This method adjusts estimated surface orientations according 
to local brightness measurements and so reduces the propagation of errors 
inherent in methods based on brightness gradients. 

None of these methods lend themselves to parallel implementation on 
a regular grid registered with the picture cells [Horn 70]. Alternatives have 
been explored that lead to methods similar to those used for solving second- 
order elliptic partial differential equations [Woodham 77] [Strat 79] [Ikeuchi 
& Horn 81]. It is difficult to come up with a convergent iterative scheme that 
directly gives height z(x,y) above some reference plane [Brooks & Horn 85] 
[Horn & Brooks 86]. Most methods instead compute estimates of surface 
orientation. These may not be integrable in the sense that they may not be 
consistent with any underlying surface. For example, if the method recovers 
estimates of p = z x and q = z y , it may be that p y ^ q x . This problem can be 
solved by finding the "nearest" integrable surface after each iteration [Frankot 
k Chellappa 87]. 

The nearest integrable surface may be found using a set of basis func- 
tions that are integrable, or by the method described above for recovering 
z(x,y) from p(x, y) and q(x,y). Numerical estimates of the derivatives of the 
computed z(x,y) are then used as starting values for the next iteration. 

It is possible to combine the iterative scheme for solving the shape form 
shading problem with that for projecting the solution onto the space of inte- 
grable solutions. Suppose for example that we wish to minimize the following 
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functional: 
JJ{E-Rip,q)) 2 + Kpl+pl + ql + ql)+^(z I -p) 2 + (z y - q Y)dxdy. 

The Euler equations for this calculus of variations problem are 

XAp = -(E - R)R p + fi(p - z x \ 

XAq = -(E - R)R q + fi(q - Zy ), 

Az = p x + q y . 

This illustrates that the methods discussed here for interpolation from sparse 
data have applications in other areas as well. 
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3. Moment Calculations for Position and Orientation 

Calculations of sums of products of image coordinates and functions of the 
picture cell grey-levels are useful in several early machine vision algorithms. 
These moments are easily calculated using many different architectures, in- 
cluding bit-sliced, pipelined, analog networks, and by means of charge coupled 
devices. Such methods have several applications. A new technique for directly 
estimating motion of the camera from first derivatives of image brightness, for 
example, depends on the calculation of such moments (as discussed in the next 
section). 

In addition, a large fraction of all binary image processing methods in- 
volve the computation of the zeroth, first and second moments of the regions of 
the image considered to be the image of one object. Presently, most commer- 
cially available machine vision systems have only rudimentary mechanisms for 
dealing with grey-level images and are aimed mainly at binary images. These 
systems typically have digital means for computing the moments. While such 
systems are restricted in their application, they are widely available and well 
understood. They can be used, for example, to determine the position and 
orientation of an isolated, contrasting workpiece lying flat on a conveyor belt 
(see, for example, Chapter 3 in [Horn 86]). Once the position and orientation 
of the object is known, a robot hand with the appropriate orientation may 
be sent to the indicated position to pick up the part. A device that finds the 
centroid of a spot of light in the image can also be used as a high-resolution 
light-pen and a means of tracking a light source, such as a light bulb attached 
to an industrial robot arm. 

A variety of methods is available for efficiently computing the zeroth- and 
first-order moments, including methods for working with projections of the 
image or run-length coded versions of the image. Less appears to be known 
about how to easily compute second- and higher-order moments, except that 
iterated summation can be used to avoid the implied multiplications. Such 
ideas are used in special purpose digital chips that that have been built for 
finding moments [Hatamian 86, 87]. We nevertheless explore analog networks 
for this task, partly to see whether they may have advantages over existing 
digital implementations, but mostly because they constitute a stepping stone 
on the way to networks for the recovery of motion from time- varying images. 
Analog circuitry for the motion vision task share many of the features of the 
simple moment generating circuits, but are more complex. 

In this section several different methods are explored for computing mo- 
ments using analog networks. It will be shown that some elegant methods 
exist that make it possible to obtain these moments using networks with rel- 
atively few components. 
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3.1. Use of First Moments for Position 

Suppose that we have a characteristic function that indicates places in the 
image where the object region is thought to be. That is, 

u x ^ y) = l 1 > if (*' y ) is in the re s ion ; 

\ 0, otherwise. 
Under favorable circumstances, such a characteristic function can be obtained 
by thresholding a grey-level image. The area of the object is obviously just 
the zeroth-order moment 

A = // b(x,y)dxdy, 

where the integral is over the whole image. 

The position of the object can be considered to be the location (x,y) of 
its center of area, defined in terms of the two first-order moments as follows: 

Ax = xb(x,y)dxdy and Ay = yb(x,y)dxdy. 

The center of area, or centroid is independent of the choice of coordinate 
system 1 J . 

3.2. Use of Second Moments for Orientation 

There are three second-order moments, and these can be used to define the 
orientation of the object as well as a shape factor. The orientation of the object 
may be taken to be specified by the direction of the axis of least inertia, which 
is independent of the choice of coordinate system axes 12 . 

The inertia of a particle relative to a given axis is the product of the mass 
of the particle and the square of the perpendicular distance of the particle form 
the axis. So the inertia of an extended object about an arbitrary axis in the 
image plane can be defined as 



= // r 2 (x,y)b(x,y)dxdy, 



1 = 

where 

r(x, y) = x sin B — y cos 6 + p 
is the distance of the image point (z, y) from the line with inclination 6 (mea- 
sured anti-clockwise form the x-axis) and perpendicular distance p from the 
origin. 



That is, its position of the centroid relative to the object does not depend on the 
choice of coordinate used in the calculation. 
1 If we rotate the coordinate system, we find that the axis of least inertia determined 
in the new coordinate system is just the rotated version of the axis of least inertia 
in the original coordinate system 
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It is easy to show that the axis of least inertia passes through the center 
of area, so it is convenient to compute the second-order moments with respect 
to the center of area (see, for example, Chapter 3 in [Horn 86]). Let 



a' = 1 1 x' b(x,y)dxdy, 
b' = / / x'y'b(x,y)dxdy, 

c ' ~ II y ,2 K x iy)dxdy, 



where x' = (x — x) and y' = (y — y). The inertia can then be expressed as a 
function of the angle of inclination of the axis in the form 

I = -{a! + c ') + _( c ' _ a ') cos 20 - V sin 20. 

Differentiating this with respect to and setting the result equal to zero yields 

(c' - a!) sin 20 o + 2 6' cos 20 o = 0, 

for the inclinations of the axes corresponding to extrema of inertia. Note 
that we don't actually need all three of the second-order moments to compute 
0o, only the combination (c' - a') and 6' are required. This observation is 
exploited later in a circuit designed to find the orientation of the axis of least 
inertia. 

There is, by the way, a two-way ambiguity here, since the equation is 
satisfied by (6 + tt) if it is satisfied by O . This is to be expected, since we are 
only finding the line about which the region has least inertia. Higher order 
moments can be used to resolve this ambiguity, but we will not pursue this 
subject any further here. 

The axis through the center of area yielding maximum inertia lies at right 
angles to the axis yielding minimum inertia. The maximum and minimum 
inertia themselves are given by 

/™* = \(a' + c') + I^ + Cc' -«<)*, 

Imin = \(o! + c') - i^4' 2 + (c'-a') 2 . 

The ratio of I mm to 7 max is a factor that depends on the shape of the object. 
It will be equal to one for a centrally symmetric object like a circular disc 
and near zero for a highly elongated object. Note that we need all three 
second-order moments to compute a "shape factor." 

So-called moment invariants are combinations of moments that are inde- 
pendent of translation and rotation of the object region in the image [Cagney 
& Mallon 86]. The second order moment invariants are all combinations of the 
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minimum and maximum inertia. There are thus only two degrees of freedom. 
One may choose any convenient combinations, such as 

■*max I J-min = a ~r C , 
(/max-/min) 2 =6' 2 + (c'-a') 2 . 

These invariants are sometimes used in recognition. 

3.3. Additional Comments and Higher Moments 

In practice the double integrals that apply in the continuous domains are 
replaced by double sums, in the obvious way. So the area, for example, is just 
(a multiple of) 

n m 

t=i j=i 
The second-order moments a', 6', and c', relative to the centroid (x,y), can 
be computed from the moments a, 6, and c relative to the (arbitrary) origin 
of the coordinate system, provided that the zeroth and first-order moments 
are known: 

a'=a — Ax 2 , b' = b — Axy, and c' = c — Ay 2 . 

Still higher moments may be used to get more detailed descriptions of the 
shape. Also, as noted, the axis of least inertia leaves an ambiguity in ori- 
entation. The third-order moments can be used to disambiguate the two 
possibilities. 

We have assumed so far that b(x,y) can only take on two values. It 
should be obvious that the same analysis holds when b(x,y) is not binary 
(yet independent of accidents of lighting and viewing geometry). This may be 
advantageous, for example, when one has a coarsely sampled image, in which 
case the position and orientation of the part may not be determined very ac- 
curately from a mere binary image because of aliasing problems. Intermediate 
grey-levels on the boundary of the object can provide information that allows 
one to determine the position and orientation to much high precision. 

3.4. Methods for Computing Moments 

There are many methods for efficiently computing moments. It is possible, 
for example, to avoid the multiplications appearing in our simple definition of 
the moments by repeated summation. Note, for example, that 

n n i n n 

E'/< = ££* = ££/, 

i=1 t=l j=l j=\ i=j 
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so that this sum can be computed using the coupled multiplication-free iter- 
ative scheme 

s» = St+i + fi and Sj = Sj+i + Sj, 
with s n+1 = and S„+i = 0. The total is given by Si. A similar scheme 
using two intermediate sums can be used to obtain a second moment and so 
on. 

In serial digital implementation, run-length coding can be used to ad- 
vantage. Further, projections can dramatically compress the information. A 
projection at an arbitrary angle 9 is given by 

Pe(t) = // b(x,y)6(x cos 9 + y sin 9 -t)dxdy, 



or 



Pe(t) = / b(t cos — ssm9,tsin9 + s cos 9) ds, 

where L is the straight line x cos 9 + y sin 9 = t. It can be shown that all n-th 
order moments can be computed from (n + 1) projections. 

Consider, in particular, the vertical and the horizontal projections (where 
9 = 0, and 9 = 7r/2 respectively): 

v(x) = / b(x,y)dy and h(y) = / b(x,y)dx. 

The integral of either projection gives us the area A, while 

Ax = xv(x)dx and Ay = yh(y)dy, 

gives us the center of area. Three projections are needed to calculate the 
second moments. We can use the two we already have, plus a projection in a 
diagonal direction (see, for example, Chapter 3 in [Horn 86]): 

The moments involving x 2 and y 2 can be obtained straightforwardly from 
v(x) and h(y), while the moment involving xy can be computed as follows: 

J J xy b(x, y) dxdy = j t 2 d(t) dt - | f x 2 v(x) dx - | / y 2 h(y) dy. 

In the case of a hexagonal grid, the three projections can be conveniently 
taken in directions spaced 120° apart. Working with projections dramatically 
reduces the amount of arithmetic required in a digital implementation. 

3.5. Feedback Method for Computing Center of Area 

The computation suggested by the equations above can be embodied in hard- 
ware in many different ways. One fairly obvious implementation of the first- 
order moment calculation uses linear resistive chains to obtain a potential 
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proportional to x at every picture cell, and a switch that injects a current 
proportional to x into a global bus wherever b(x,y) = 1. The bus is termi- 
nated in a resistor; its potential represents Ax. A similar arrangement is used 
to compute Ay. A third bus is used to compute a potential proportional to 
the area A itself. (If b(x,y) is not binary, analog multipliers will need to be 
used at each picture cell in order to obtain the required products.) 

The computation can also be performed by first obtaining the horizontal 
and vertical projections. (In the case of grey-level information, the number of 
analog multipliers is drastically reduced by first computing these projections.) 
A one-dimensional circuit, using analog multipliers, then computes the one- 
dimensional centroid of each of the two projections. Note that the projections 
may also be processed off chip and that the projections may be obtained using 
charge coupled device technology. 

Other possibilities abound. A particularly simple method involves a feed- 
back scheme obtained by noting that the inertia about an axis perpendicular 
to the image plane is minimized when this axis passes through the centroid. 
That is, we have to find x and y such that 

E = II ^ ~ ^ + (2/ " ^ 6(X ' V) dX dy ' 
is made as small as possible. The derivatives of this integral with respect to 
x and y are 

~fa = ~ 2 11 {^-x)b{x,y)dxdy and — = -2 / / (y - y)b(x,y)dx 

We can use a gradient descent method to solve the least squares problem. 
This leads to the scheme: 

~dt =a II ( x -*) b ( x 'y) dxd y and Jr = a // (y-y)Kx,y)dxd y , 

where a is a gain factor that controls the speed of adjustment of the estimates 
of x and y. When the circuit settles, the time derivatives are zero and the 
extremum has been reached. 

A simple implementation of this idea uses a global bus with potential 
proportional to the present estimate of z, the x-component of the center 
of area. Each picture cell injects a current proportional to the difference 
between its x coordinate and the bus potential x, provided that it is in the 
object region, that is, if b(x,y) = 1. The current can be generated easily by 
a resistor that is connected between a potential follower that buffers the x- 
coordinate potential and the bus for x. A switch connects this resistor to the 
buffer amplifier wherever b(x,y) = 1. A similar arrangement is used for y, the 
y component of the center of area. Both busses are terminated in capacitors 



dy. 
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connected to ground 13 . 

In the equilibrium state, the currents injected into each of the two busses 
add up to zero. There is no need in this scheme to compute the area separately, 
if only the center of area is needed. A discrete analog chip has in fact been 
built that determines the centroid using a method like this [DeWeerth h 
Mead 88]. 

3.6. Feedback Schemes for Orientation 

The computation of orientation is a little harder, since it involves second- 
order moments and requires that the center of area be known. The axis of 
least inertia passes through the centroid and the distance of an image point 
from the axis is just 

r(x , y ) = x' sin — y' cos 0, 
where x' = (x - x) and y' = (y - y). We could try to develop a feedback 
scheme for computing the angle directly, but this would require evaluation 
of trigonometric functions and some way of letting the representation "wrap 
around" when 9 exceeds +7r or becomes less than — ir. It is better to use 
a redundant representation for orientation. One can, for example, use two 
quantities, c and s, proportional to the sine and cosine of 6. The inertia 
integral then can be written 



= // (x's-y'c) 2 b(x,y)dxdy. 



The only difficulty with a scheme based on this approach is the need to keep 
the two quantities consistent, that is, some way of ensuring that c 2 + s 2 — 1 =0. 
One way of dealing with this constraint is to add a Lagrange multiplier 
term to obtain the modified inertia integral 

I' = fj (x's - y'c) 2 b(x,y) dx dy + A (c 2 + s 2 - 1). 

We might then consider differentiating with respect to c, s, and A to obtain 
the gradient of the modified inertia integral: 



— = -2 / / (x's - y'c) y' b(x,y) dx dy + 2Ac, 

■i t r 

- = +2 / / (x's - y'c) x' b(x, y) dx dy + 2As, 

(c 2 + s 2 - 1). 



dl 

ds 
dV_ 
d\ 



It may be sufficient to use the parasitic capacitance of the busses, provided there 
are no stability problems resulting from unmodeled effects. 
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By adding c times the first derivative to s times the second derivative, we see 
that at the stationary point, 

»-//„<*.-•*.<.,,*«.--,. 

Subtracting s times the first derivative from c times the second derivative, we 
also see that at the stationary point, 



sc 



[J (y' 2 - x' 2 ) dx dy + (c 2 -s 2 ) [I *V dx dy = 0. 



If we use polar coordinates here for a moment and let c = pcosO and s = 
p sin 0, we note that 

/' = p 2 If (*' sin - y'cosS) 2 b(x,y)dxdy + \(p 2 - 1). 
Now 

lL = -2p 2 II ((y' 2 -x' 2 )sm9cos9 + x'y'(cos 2 9-sm 2 9)) dxdy, 
which, of course, is zero at the stationary point. Also 

-1L = -2p 2 II ((y' 2 - x' 2 ) (cos 2 9 - sin 2 9) - 4a?'y' sin 9 cos #) dx dy, 

At the minimum of /, this equals 

ff D (x' 2 +y' 2 )b(x,y)dxdy 

J(ff D x>y>b(x,y)dxdy) 2 + (ff D (y> 2 - x> 2 )b(x,y)dx dy) 2 

So the second derivative of /' with respect to 9 is positive at the minimum 
(and it is negative at the maximum). Unfortunately, the second derivative of 
V with respect to p is zero there, given the value computed above for A. Also, 
the second derivative of V with respect to A is always zero. 

We conclude that the minimum of the original inertia integral, / (a func- 
tion of the single variable 9), does not correspond to a minimum of the modi- 
fied inertia integral, V (a function of the three variables c, s, and A), but rather 
some kind of saddle point. This means that we cannot use steepest descent 
methods directly. We discuss a novel way of dealing with this problem in the 
next section that requires inverting the sign of the gradient component in the 
A direction and the addition of a penalty term proportional to (c 2 + s 2 — l) 2 
(see also [Piatt & Barr 88]). But in this particular case we can avoid this 
complication by noting that when the solution has been found, 



A = — // (x's — y'c) 2 b(x,y)dxdy. 



This provides a way of estimating A at any given stage of the computation 
that we can use instead of gradient descent for finding a new value of A. 
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Another way of dealing with the problem is to make adjustments only 
in directions that keep the value of c 2 + s 2 constant. This can be done by 
removing the component of the gradient that is along the normal to the con- 
straint curve defined by c 2 + s 2 — 1 = 0. This gradient projection method leads 
to useful feedback schemes for finding the minimum. Gradient projection is 
discussed in more detail in the next section. 

3.7. Normalization of the Inertia Integral 

In this particular case here we can use an approach that is a bit simpler 
than gradient projection, because the integral we are trying to minimize has a 
particularl scaling property. Note that if we multiply c and s by some constant 
k, the integral is just multiplied by k 2 . This suggest that we can circumvent 
the difficulty noted above simply by normalizing the integral by dividing by 
(c 2 +s 2 ). This makes the result independent of the scale of c and s. To force c 
and s to have the correct scale, we can then add a penalty term proportional 
to the square of the error in the constraint c 2 -f s 2 — 1 = 0, so that overall we 
now have to minimize 



— - if (x's - y'c) 2 b(x,y) dxdy + /i(c 2 + s 2 - l)' 



/" _ * // („*„ _„'.\2l,L *.\ J„J*. i ..(3. x „2 i\2 

C 2 + 3 s 



Note that /i is not an unknown parameter, but a quantity we can adjust to 
control the rate of convergence of the resulting system towards the condition 
c 2 + s 2 = 1. Differentiating I" with respect to c and s we obtain 

// e(x,y)b(x,y) dxdy + 4fi(c 2 + s 2 - l)c, 



dc (c 2 +s 2 ) 2 

dl" _ -2c 
ds ~ (c 2 +s 2 ) 2 



/ / c(x, y) b(x, y) dx dy + 4fi(c 2 + s 2 - l)s, 



where 



e(x, y) = x'y'(c 2 - s 2 ) + (y' - x' ) sc. 
This suggests a simple gradient descent method: 

— = -a s / / e(x,y) b(x, y) dx dy - f3c(c 2 + s 2 - 1), 

— = + a c / / e(x, y) b(x, y) dx dy - /3s(c 2 + s 2 - 1), 

where f$ = 2a \i. Note that if we omit the penalty term, that is, when ft = 0, 
then the adjustment to (c, s) is in the direction (— s,c), which is orthogonal 
to (c, s). Thus the magnitude of (c, s) is preserved by this component of the 
adjustment. The other component, arising from the penalty term, is in the 
direction (c, s) and thus does not affect the first part of the modified integral, 
only bringing the magnitude of (c, s) closer to unity. 
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The above leads to the following feedback scheme: The values of c and 
s are represented as potentials on global bus lines. The combinations sc and 
(c 2 — s 2 ) may be either computed locally, or distributed by other global bus 
lines, fed by circuits that compute these values based on the values of c and s. 
At each picture cell we compute the differences x' = (x — x) and y' = (y — y), 
as well as x'y' and (y' 2 - x' 2 ). Additional circuitry is used to obtain the error 
term 

e = x'y'(c 2 -s 2 ) + (y' 2 -x' 2 )sc. 
Finally, in places where b(x, y) = 1, currents proportional to -e s and + e c are 
injected into the global busses for c and s respectively. No current is injected 
at picture cells where b(x,y) = 0. 

The two global busses for c and s are terminated in capacitors connected 
to ground 14 . A single separate global feedback circuit is used to ensure that 
c 2 + s 2 — 1 = 0. The error in the magnitude of the sum of c 2 and s 2 is 
computed and used to inject currents proportional to —(3c(c 2 + s 2 — 1) and 
—(3s(c 2 + s 2 — 1) into the capacitors whose potentials represent c and s. 

It may appear at first sight that there is an opportunity for instability 
here, since there are two coupled first-order feedback loops. No such problem 
arises, however, since the adjustments made to (c, s) by the two systems are 
in orthogonal directions. 

There may also appear to be potential start up problem here, since the 
adjustments are zero as long as c = and s = 0. But this is not a serious 
concern, since this state is an unstable equilibrium and so any small noise 
current will cause the system to move away from it. 

In the scheme as described so far, a considerable number of local compu- 
tational elements are needed to obtain the factors x'y' and (y /2 — x' 2 ). Rather 
than computing these terms at each picture cell from values of x' and y', one 
can obtain them using two resistive grids. This is because both of these factors 
are harmonic functions, that is, they satisfy Laplace's equation, Af(x,y) = 0. 
A uniform resistive sheet solves Laplace's equation when no current is injected 
into it. If the boundary of the grid is held at a potential proportional to x'y', 
the interior will also settle to a potential proportional to x'y', since this is the 
unique solution of Laplace's equations for these boundary conditions. The 
same holds for (y' — x' ), which is actually just a scaled version of the same 
function rotated by 7r/4. This idea is explored further later in this section. 

In any case, the overall circuit will settle into one of two opposite states, 
depending on initial conditions, provided that the object imaged is not too 
symmetrical. (If the object is almost symmetrical, currents generated in one 



It may be sufficient to use the parasitic capacitance of the busses, provided there 
are no stability problems resulting from unmodeled effects. 
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image area will tend to cancel currents in other image areas and small overall 
noise currents will drive the result.) We have above, by the way, indirectly 
solved the problem of finding the eigenvector corresponding to the smallest 
eigenvalue of a 2 x 2 symmetric matrix. It may seem surprising that a fairly 
straightforward analog circuit can do this for us! We explore a generalization 
of this in the next section. 

3.8. Resistive Networks for Moment Calculation 

If area and center of area is all we are computing, then even the simple feed- 
back schemes discussed above appear to constitute overkill. Consider first a 
regular one-dimensional chain of N resistors each of resistance R. Above we 
used such a simple resistive chain to generate potentials at each node linearly 
related to the position. This potential was then used in further calculation — 
to generate a current injected into a global buss. Now consider a different way 
of using the very same chain. Suppose that the chain is grounded at each end, 
and that we can measure the currents 1/ and I r flowing into the ground at 
these points. There are k resistors to the left and (N — k) to the right of the 
fc-th node. Suppose a potential V develops at the fc-th node when we inject a 
current / there. Clearly 

V V 

Ii = -r-=- and I r = 



kR J (N-k)R' 

while the total current is 

N V 
I = I, + I r = k{N _ k) ^ 

so that 

// N-k , I r k 

_ = __ and y== __ 

We can compute the "centroid" of these two currents: 

Ii I r k 

x = xi j + x r — = x t + —{x r - xi), 

which is the x coordinate of the place where the current was injected. If we 
inject currents at several nodes, we can show, using superposition, that the 
computation above yields the centroid of the injected currents. 

Now imagine a regular two-dimensional resistive grid grounded on the 
boundary. Current is injected at each picture cell where b(x,y) = 1. The 
currents to ground on the boundary from the network are measured. The 
total current obviously is proportional to the area, that is, the number of 
picture cells where b(x,y) = 1. More importantly, the center of area of the 
current distribution on the boundary yields the center of area of the injected 
current distribution. We show this now in the continuous case. 
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To see this more clearly, consider a uniform resistive sheet covering the 
region D, grounded on the boundary dD. Current i(x,y) per unit area is 
injected into the sheet at the point (#,y), where the potential is v(x,y). The 
potential satisfies Poisson's equation 

Av(x,y) = -pi(x,y), 

where p is the resistivity (per unit square). Now consider the current density 
per unit length extracted from the sheet at the boundary: 

where the normal derivative of the potential can be defined by 

dv dv dy dv dx 

dn dx ds dy ds ' 
with the tangent to the boundary given by 

dx dy\ 

ds* ds J 

It is clear that the total current injected into the sheet must equal the total 
current leaving through the boundary. We can show this formally using the 
two-dimensional version of Green's formula [Korn & Korn 68]: 

// (uAv-vAu)dA = / lu- v— Ids, 

JJd JdD \ dn dn) 

with v = v(x,y) and u(x,y) = 1. We obtain 



//„—=!>• 



or 



// i(x,y)dA = / j(x,y)ds. 
JJd J&D 

This works, of course, even when the boundary is not grounded. 

Now, if we instead use u(x,y) = x in Green's formula, we obtain 

which, since v(x,y) = on the boundary, becomes just 

// xAvdA= I x-^-ds, 
JJd J bd on 

so that 

// xi(x,y)dA= / xj(x,y)ds. 
JJd JdD 

So the first-order moment in the ^-direction of the boundary current is equal 
to the first-order moment in the ^-direction of the injected current. Similarly, 

// yi(x,y)dA= / yj(x,y)ds. 
JJd JdD 
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The same trick can be used with any harmonic function u(x,y), that is, a 
function for which Aw = 0. 

It is easy to see that xy and (y 2 — x 2 ) are harmonic functions, so we can 
compute their integrals in this fashion also: 

ff(y 2 -x 2 )i(x y y)dA= J (y 2 - x 2 )j(x,y)ds, 
JJd JdD 



and 



// xyi(x,y)dA = / xyj(x,y)ds. 
JJd JdD 



Now the first of these integrals corresponds to (c — a), while the second corre- 
sponds to b in the calculation of orientation. This means that we can obtain 
the position and orientation of a region just from the currents on the boundary 
of the resistive network. 

Note, however, that we cannot obtain all three second-order moments 
independently from the boundary currents. We only obtain one of the two 
second order moment invariants. Consequently we can also not compute a 
shape factor from the boundary currents. 

The two-dimensional Laplacian operator can be written in polar form as 

Au--— (r— \ — — 
r dr \ dr ) r 2 d6 2 ' 

so we see that 

Uk = r h cos(k0) and vu = r k sin(&0), 

are two families of harmonic functions. We have used the first few members 
of these sets already, namely, 

1, x = rcos6, y = rsin0, x 2 — y 2 = r 2 cos 20, and 2xy = r 2 sin 20. 

The next pair of harmonic functions one could use are the monkey-saddle 
functions 

x 3 — 3xy 2 and 3x 2 y — xy 2 . 

Continuing in this way, we see that one can compute two combinations of each 
of the (n + 1) moments of n-th order from the boundary currents. We cannot 
compute all of the moments independently. For purposes of determining the 
position and orientation, however, we only need the first few. 

3.9. Implementation Details & Previous Work 

To obtain the required combinations of moments, we have to integrate the 
product of the boundary current with 

1, ar, y, (x 2 — y 2 ) and 2xy. 
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The first is just the total current flowing out of the resistive network. The 
computation of the rest will be affected somewhat by the shape chosen for 
the resistive network. In the case of a circular image region, for example, we 
multiply the currents by weights that vary as 

1, cos#, sin0, cos 29 and sin 20, 

where 9 is the angle measured from the center of the image. Note that the 
weights are fixed for each point on the boundary. The computation may be 
simplified by using a square boundary, but at the cost of loss of rotational 
symmetry. 

There has been considerable work on finding moments using digital means. 
Special purpose systems have been developed for tracking objects using these 
schemes [Gilbert et al. 80] [Gilbert 81]. Also, a number of special purpose 
digital signal processing systems have been built to compute moments. Some 
of these systems have much of the required circuitry on a single digital chip 
[Hatamian 86, 87]. Furthermore, a discrete analog chip has been built that de- 
termines the centroid using a gradient descent method [DeWeerth &, Mead 88]. 
With considerable increase in circuit complexity this could perhaps be ex- 
tended to also determine orientation using the approach described in the first 
part of this section. 

There also exists a continuous analog light- spot position sensor that uses 
a method similar to the one described above (Selspot Systems). It consists 
of a single, large, square photo-diode and some electronics. Electrodes are 
attached on four edges of the "lateral effect" photo-diode and four operational 
amplifiers are used to measure the short-circuit current out of each of the four 
edges. The total current is just the integral of the signal. The ratio of the 
difference to the sum of the currents on opposite edges gives the position of 
the centroid in one direction. The currents in the other two edges give the 
other component of the centroid. 

Apparently the possibility of computing combinations of higher moments 
from the boundary currents, and thus determining orientation also, has not 
previously been noted. 

3.10. A Network Equivalence Theorem 

In the above we have explored two apparently quite different ways of using a 
simple resistive network: 

• Apply a given potential distribution along the edge of the network and 
use the open-circuit potentials at interiors nodes in further calculation, 
and 
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• Inject currents at interior nodes and use the measured short-circuit cur- 
rents on the edge in further calculation. 

There is an intimate relationship between these two ways of using a resistive 
network. In some cases one of the two schemes leads to much simpler imple- 
mentation than the other, so it is important to understand the equivalence. 
This will now be explored in more detail for arbitrary networks of resistors. 

Consider a resistive network with external nodes segregated into two sets 
A and B of size N and M respectively. Now perform two experiments: 

1. Connect the nodes in group A to voltage sources with potentials V n for 
n = 1, 2, . . ., N and measure the resulting open-circuit potentials on the 
nodes in group B. Let these be called v my for m = 1, 2, . . ., M. 

2. Connect the nodes in group B to current sources with currents i m , for 
m = 1, 2, . . ., M, and measure the short-circuit currents in the nodes of 
group A. Let these be called I n for n = 1, 2, . . ., N. 

Then 



JV M 



n=l m=l 



Proof: Consider in case 1 that we apply a potential only to node n in group 
A, that is, Vjfe = for k / n. Let the resulting open-circuit potential on node 
m in group B be called v m>n . We note that superposition tells us that the 
potential on node m in group B when potentials are applied to all of the nodes 
in group A is 

JV 

Vm = / v Vm,n- 
n=l 

Next, consider in case 2 that we inject current only at node m in group B, 
that is it = for I ^ m. Let the resulting short-circuit current at node n in 
group A be called I n , m - We note that superposition tells us that the current 
in node n of group A when currents are injected into all of the nodes of group 
B is 

M 

-*n = / j J-n,m- 
m=l 

The reciprocity theorem tells us that 

Now sum over all of the nodes in group A: 

N JV 

n=l n=l 
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or 

N N 

n=l n=l 

Then sum over all of the nodes in group B: 

M N M 

m=l n=l m=l 

or 

N M N M M 

/ j / j Vn*n,7n = / J Vn / J ln,m — / J *m ^m> 
n=l m=l n=l m=l m=l 

or, finally 

N M 

/ _, -*n ^n == / v *m v m- 
n=l m—\ 



3.11. Application 

One application of this theorem is in the simplification of circuits for the ana- 
log computation of some weighted average. Suppose that we have a resistive 
network that is used to compute some quantities v m (for example, a potential 
representing the x position in an image) from some fixed inputs V n (for ex- 
ample, potentials representing x on the edge of the resistive network). These 
potentials are then used to compute a weighted average like 

EM 
__ m=l l m V m 



where the quantities i m are the weights (for example, image brightness). 

Then an equivalent way of obtaining the same result is to inject currents 
proportional to i m into the resistive network, now grounded in the places 
where inputs where applied earlier. Let the currents at the places where the 
network is grounded be I n . Then the same weighted average can be obtained 
by computing instead 

V — ^m=l -*n Vn 

Which of the two schemes is simpler depends on details of the implementation, 
including the relative sizes of N and M. 

3.12. Example 

In the (one-dimensional version of the) centroid-finding chip, a potential rep- 
resenting x is generated from two fixed input potentials applied at either end 
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of a uniform resistive chain. An output current proportional to the product 
of the light intensity at a picture cell and the local value of x is injected into 
a global bus. The weighted average of the potentials at the picture cells can 
then be computed from this current and a current proportional to the total 
brightness 15 : 

EM 
— m=l V m lm 
V ~ y^M 

This allows us to determine the x position of the centroid of the light spot 

_ _ (V2 — v)xi + (v — Vi)x 2 

x ~ v^Tv, ' 

where x\ and x 2 are the coordinates at either end of the resistive chain, at 
the points where the potentials V\ and V2 are applied. 

The computation can also be performed by injecting currents propor- 
tional to the brightness at each picture cell into the same uniform linear 
resistive chain now grounded at either end. The centroid can be computed 
from the currents flowing into ground at the ends: 

_ xih +X2I2 
x = — . 

/1+/2 
In this particular case, the second scheme appears to be simpler. 

3.13. Generalizations 

The same ideas can be applied in the continuous domain, where we are dealing 
with resistive sheets, rather than networks of discrete components. In partic- 
ular, we can use it to design circuits that compute combinations of various 
moments of image brightness. Consider a uniform two-dimensional resistive 
sheet. The potential in the interior satisfies Laplace's equation 

Av(x,y) = 0. 
We can obtain a potential distribution proportional to an arbitrary harmonic 
function v(x,y) simply by applying a potential proportional to v(x,y) to the 
boundary of the sheet. 

The functions 1, x, y, xy and y 2 — x 2 are harmonic. This suggest that we 
can use this idea to compute the zeroth, first, and some combinations of the 



In the feedback version of this idea, currents are generated proportional to the 
difference between the potentials representing x and x. These are injected into 
an unterminated global bus. In the steady state we have from Kirchhoff's law: 

M 

/ j (v m ~ v)i m = 0, 
m=l 

which leads to the same result. 
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higher-order moments of image brightness. A current density proportional to 
the product of brightness at each point in the image and the potential at the 
corresponding point on the resistive sheet is injected into a global bus. The 
total current is proportional to the desired combination of moments. 

Alternatively, using the method developed above, we can use the same re- 
sistive sheet to direct the currents rather than as a way of generating potential 
distributions. At each point we simply inject a current density proportional 
to brightness. The edge of the resistive sheet is now grounded and the current 
density along the boundary is read out. The desired moment is obtained by 
integrating the product of v(x,y) and the current density on the boundary. 

In the continuous domain, with a uniform resistive grid, the equivalence 
between the two methods described above can be obtained by an application 
of Green's theorem in two dimensions for converting an integral over a region 
into an integral along the boundary of the region. We have 

/ / (uAv — vAu)dxdy = / ( u— v-r— ] ds. 

JJd JdD \ dn dnj 

Now let u(x,y) be the potential on the resistive sheet, while v(x,y) is some 

chosen harmonic function. Then, if the current density injected into the sheet 

is i(x,y) and its resistivity /o, we have 

Au(x,y) = -pi(x,y), 

while the current density along the boundary is given by 

du(x,y) 

Using the fact that u(x,y) = on the boundary, and that v(x,y) is harmonic 
we obtain from Green's theorem: 



// (—vAu)dxdy = I I — v-r— ] ds. 
JJd ) )bd\ dnj 



Substituting for u(x,y) in the interior in terms of the injected current density 
and for the normal derivative of u(x,y) on the boundary, we finally see that 

// v(x,y)i(x,y)dxdy = / v(x,y)j(x,y)ds. 
JJd JdD 

The above analysis holds as long as the multiplier is a harmonic function. 

The theorem presented earlier is more general, since it applies even to 
networks that are not uniform and does not require that the multiplier func- 
tion be harmonic. It also can be used directly on discrete networks and does 
not involve approximating a continuous resistive sheet with a discrete one. 

In the discrete case, there is a definite implementation advantage to the 
second scheme, since there are few nodes on the boundary in comparison to 
the number of nodes in the interior of the resistive grid. 



4. Short Range Motion Vision Methods 39 

4. Short Range Motion Vision Methods 

Attacks on the motion vision problem can be categorized in a number of ways. 
First of all, there is the question of how a large a change between successive 
images the method is meant to deal with. Feature-based methods appear to be 
best suited for the so-called long-range motion vision problem, where there is 
a relatively large change between images. Conversely, these methods generally 
are not good at estimating motions with sub-pixel accuracy. Feature-based 
methods essentially solve the correspondence problem, which is the central 
problem in binocular stereo. Unfortunately, the problem in motion vision is 
typically even harder than the binocular stereo problem, because the search 
for a match is not confined to an epipolar line. 

Gradient-based methods are better suited to situations where the motion 
between successive images is fairly small, that is, the short-range motion vision 
problem. Correlation methods appear to fall somewhere in between, since they 
cannot deal with significant changes in foreshortening or photometric changes, 
yet are not able to produce displacement estimates with sub-pixel accuracy. 

There are several different approaches to the short-range motion vision 
problem. Here we briefly list some based directly on brightness derivatives 
rather than matching of isolated features or correlation. We first discuss sev- 
eral methods for recovering optical flow and then go on to methods for recov- 
ering rigid body motion directly, without using optical flow as an intermediate 
result. 

All methods for recovering motion implicitly make some assumptions 
about how images change when the viewer moves with respect to the scene. 
Simple correlation methods, for example, assume that changes in foreshorten- 
ing can be ignored. This is not a good assumption in wide-baseline binocular 
stereo nor in some long-range motion vision applications. Feature-based meth- 
ods and correlation methods also assume that the brightness pattern does not 
change drastically with viewpoint. Fortunately, the brightness of many real 
surfaces does not depend significantly on the viewing direction for a fixed 
illumination geometry. 

Methods based on brightness gradients implicitly assume that the varia- 
tions in brightness at a particular point in the image due to motion are much 
larger than the brightness fluctuations induced by changes in viewpoint. This 
is a reasonable assumption unless the surface lacks markings and is illumi- 
nated by rapdidly moving light sources. Most methods will be fooled by the 
motion of virtual images resulting from specular or glossy reflections of point 
light sources. 
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4.1. Recovering Optical Flow from Brightness Derivatives 

The motion field is the projection in the image of velocities of points in the 
environment with respect to the observer. Observer motion and object shapes 
can be estimated from the motion field. The optical flow is a vector field 
in the image that indicates how brightness patterns move with time. The 
optical flow field is not unique, since the matching of points along an isophote 
in one image with an isophote of the same brightness in the other image is 
not unique. Additional constraints have to be introduced in order to select a 
particular "optical flow." Under favorable circumstances the optical flow so 
computed is a good estimate of the motion field. There are several algorithms 
of different complexity and robustness for estimating optical flow. At one 
end of the spectrum we have algorithms that assume the flow is constant 
over the image, at the other, there are algorithms that can deal with depth 
discontinuities. Many of the interesting variations are listed here in order of 
increasing complexity: 

1. Constant Optical Flow [Nagel 84] & [Weldon 86]: Here the flow veloc- 
ity, (w,v), is assumed to be constant over the image patch. This may be 
a good approximation for a small field of view. Several cameras aimed 
in different directions (spider head) could yield flow vectors that provide 
the information necessary to solve for the observer motion. Alternatively, 
this computation may be applied to (possibly overlapping and weighted) 
patches of one image. A basic least squares analysis leads to a simple algo- 
rithm. All that is required is: (a) estimation of the brightness derivatives 
E x , E y , and E t , (b) accumulation of the sums of the products E%, E x E y , 
Ey, E x E t , and E y E t , and, (c) solution of two linear equations in the two 
unknowns u and v. This last step could be done off-chip, using the totals 
accumulated on-chip. Alternatively, the computation can be done in an 
iterative or feedback mode on chip (as it is in [Tanner &: Mead 87]). The 
bandwidth going off-chip is very low in either case. If the computation is 
done for many (possibly overlapping and weighted) image windows, then 
an optical flow vector field results (at resolution less than the full image 
resolution). Such a vector field can then be processed off-chip to yield 
camera motion and scene structure using a least-squares method (a Id 
[Bruss k Horn 83]). 

2. Basic Optical Flow [Horn & Schunck 81]): Here the velocity field is 
allowed to vary from place to place in the image, but is assumed to vary 
smoothly. Depth discontinuities are not treated, but elastic deformations, 
fluid flows and rigid body motions yield reasonable results. The calculus 
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of variation problem here leads to a coupled pair of Poisson's equations 
for u(x,y) and v(x,y), the components of the optical flow. The right- 
hand sides of these equations (that is, parts not involving u and v) are 
computed from the brightness derivatives. One needs to be able to com- 
pute values such as (a 2 + El + E 2 ) (or approximations thereto). The 
partial differential equations themselves, of course, can be conveniently 
solved on two interlaced resistive networks. The inputs may be currents 
injected at nodes, while the outputs are the potentials there. The bound- 
aries have to be treated carefully. The algorithm is robust with respect 
to small random errors in the resistive network. (It is, by the way, not 
robust against round-off error in the digital version, common when the 
number of bits available to representing u and v are limited). As usual, 
there is some small advantage to working on a hexagonal grid. 

3. Optical Flow with Multiplier [Gennert & Negahdaripour 87]: The 
basic optical flow algorithm is based on the assumption that the bright- 
ness of a small patch of the surface does not change as it moves. In 
practice there are small brightness changes, since the shading on the sur- 
face may change slowly as a patch moves into areas that are illuminated 
differently. When the surface is highly textured, brightness variations at 
a point in the image resulting from motion are much larger than those 
due to changes in shading and illumination, and so these can be safely 
ignored. If there is no strong texture on the surface, somewhat better 
results can be obtained if one takes account of these small changes in 
shading. One can do this using a simple multiplier model. Here the 
brightness of a patch in a frame of an image sequence is assumed to be a 
multiple of the brightness of the same patch in the previous frame. The 
multiplier (assumed to be near unity) is allowed to vary from point to 
point in the image, but is assumed to vary slowly with position. The 
resulting calculus of variation problem now leads to three coupled par- 
tial differential equations. The new algorithm is not much more complex 
(about 50% more work) than the basic one, yet yields better results. 

4. Optical Flow with Discontinuities [Koch, Marroquin &; Yuille 86] 
[Gamble & Poggio 87] [Hutchinson, Koch, Luo Sz Mead 87] [Murray & 
Buxton 87]: The notion of a line process for dealing with discontinuities 
in images originated with [Geman & Geman 84]. This idea was later ap- 
plied to discontinuities in optical flow by [Koch, Marroquin & Yuille 86], 
[Hutchinson, Koch, Luo & Mead 87] and [Murray & Buxton 87]. To deal 
with discontinuities in the optical flow, which typically occur at object 
boundaries, one introduces line processes that cut the solution and pre- 
vent smoothing over discontinuities. The resulting penalty function to be 
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minimized is no longer convex and the solution involves more than simply 
solving a set of coupled partial differential equations. It seemed at first 
that this approach was doomed to failure, since methods like simulated 
annealing for solving such nonlinear problems are hopelessly inefficient 
on an ordinary serial computer. However, a reasonably efficient method 
results if one gives up the demand for the absolute global minimum and 
instead is satisfied with a good solution, with cost close to the absolute 
minimum cost [Blake & Zisserman 88]. It helps to base the decision about 
whether to introduce a line process at a particular place only on the lo- 
cal change in the cost of the solution [Geman & Geman 84]. Further 
improvements in performance can be had if line processes are allowed 
only very near to discontinuities in brightness, that is, edges [Gamble & 
Poggio 87] . This suggests integrating some edge finding algorithm on the 
same chip. The approach here leads to an analog network that interacts 
with some logic circuits implementing the line-process decision making 
(see Figure 5 in [Koch, Marroquin & Yuille 86]). 

Often there is a concern about the rate of convergence of simple methods for 
solving Poisson's equation. Multi-grid methods are suggested as a means of 
speeding up the process. This is fortunately not so much of a concern here 
since: 

• It is rare to have no inputs (zero right-hand side) over large patches (that 
is, large patches of uniform brightness are rare). 

• The analog networks ought to settle fairly rapidly, even when there are 
many nodes since the time-constant should be small. 

• Excellent starting values are available from the solution for the previous 
frame. 

Because it is difficult to get good estimates of optical flow from noisy image 
data, there has been a trend recently to go directly to the ultimately desired 
information, namely observer motion and object shape. Instead of computing 
these from a flow field, they are derived directly from image brightness and 
the partial derivatives of brightness. These methods too lend themselves to 
implementation in a parallel network (see next section). They do, however, 
assume rigid body motion. Thus these methods are of little use when we are 
dealing with elastic deformations and fluid flow. Consequently there is still 
a strong interest in finding rapid, robust methods for estimating the optical 
flow. 

4.2. Direct Recovery of Rigid Body Motion 

It is possible to derive observer motion and object shape directly from bright- 
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ness gradients using something like a least-squares approach. These methods 
are not as mature as those for estimating the optical flow, but may ultimately 
be of more interest. A number of special cases have been solved so far: 

1. Pure Rotation [Alomoinos & Brown 85] [Horn & Weldon 88]: In the 
case of pure rotation, the motion field is particularly simple since it does 
not depend on the distances of the observer from the objects in the scene. 
In this case a simple least-squares analysis leads to a set of three linear 
equations in the three unknown components of the angular velocity vector 
u) = (A, B, C) T . The coefficients of these equations are once again sums 
over the whole image of products of brightness derivatives and image 
coordinates. The algorithm is remarkably robust with respect to noise in 
the brightness derivatives, since the problem is so highly overdetermined 
(three unknowns and hundreds of thousands of measurements). 

2. Pure Translation [Horn h Weldon 88]: In the case of pure translation, 
the task is to recover the direction of the translation vector. The focus of 
expansion is the intersection of this vector with the image plane, that is, 
it is the image of the point towards which the observer is moving. Once 
the focus of expansion has been located, relative distances of selected 
points in the scene (where the brightness gradient is large enough in the 
direction towards the focus of expansion) can be estimated. (One simply 
divides the rate of change of brightness in the direction towards the focus 
of expansion by the time rate of change of brightness.) There are several 
methods for recovering the direction of translation. The most promising 
at this point requires eigenvector-eigenvalue decomposition of a 3 x 3 
matrix constructed using sums of products of brightness derivatives and 
image coordinates. These sums could be computed on-chip, with the final 
analysis being done off-chip. This algorithm is not nearly as robust as 
the one for pure rotation, since there are now an enormous number of 
additional "unknowns," namely the distances to the scene at each picture 
cell. For the same reason this algorithm is much more interesting since 
it allows us to recover depth and thus obtain surface shape information. 

3. Planar Surface [Horn & Negahdaripour 87]: If the scene consists of a 
single planar surface (perhaps an airport viewed from a landing aircraft), 
it is possible to compute the direction of translation, the orientation of 
the plane, the rotational velocity of the observer, as well as the time to 
impact, directly from certain sums accumulated over the whole image. 
There is a two-way ambiguity in the result that can be resolved using 
other sensory information or by waiting for new solutions based on sub- 
sequent frames. The sums required are "moments," products of the par- 
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tial derivatives of brightness (E x , E y , and E z ) and the image coordinates 
a;, and y. The final calculation involves eigenvector-eigenvalue decompo- 
sition of a 3 x 3 matrix constructed using these sums, but this can be 
done off-chip. Both closed form and iterative solutions are known. There 
are quite a large number of different sums needed, but each is relatively 
simple to compute. 

4. Other Constraints on Motion: E.J. Weldon and his students at the 
University of Hawaii have been investigating a number of other special re- 
strictions on motion. A wheeled vehicle moving in contact with a smooth 
surface is confined to translation in the local tangent plane and rotation 
about the local normal. Thus the rotation vector has to be perpendicular 
to the translation vector. This constraint allows a solution of the motion 
vision problem that takes a form very similar to the one discussed above. 
Another interesting special case arises when the vehicle can rotate only 
about an axis parallel to the translational vector. There is also strong 
interest in exploiting fixation or tracking. If one fixates on a point in 
the moving environment, a constraint is introduced between the instan- 
taneous rotational and translational velocities of the observer relative 
to the environment. This allows one to simplify the motion constraint 
equation and reduces the problem to something similar to that of pure 
translation. 

The general case (arbitrary surface, both translation and rotation) has not 
been solved yet. Also, the pure translation solutions are not very robust, 
suggesting that to one needs to continue the solution in time in order to 
get stable results (all of the methods discussed above work "instantaneously" 
using two image frames, and do not make much use of information in earlier 
frames). 

In the case of pure translation, depth is recovered only in places where the 
local brightness gradient is strong enough in the direction towards the focus 
of expansion. This suggests the need for a smooth interpolation process that 
fills in the rest. It might take the form of the solution of Laplace's equation 
or the bi-harmonic equation. A simple passive network will do for Laplace's 
equation, of course. If the higher order approach is taken, negative resistances 
and more connections are required. It is possible, however, as we saw earlier, 
to decompose the bi-harmonic equation into coupled Laplace equations. The 
latter can then be solved using coupled resistive network. 

Finally, to deal with depth-discontinuities, one can introduce line-process- 
es once again. Naturally, we are now talking about a pretty complex system! 
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4.3. Contant Flow Velocity 

The method that assumes that optical flow is constant in a patch will be con- 
sidered next, as a simple illustration of the kind of approach taken. First we 
review the brightness change constraint equation. Image brightness E(x,y,t) 
is a function of three variables. If the brightness of a small patch does not 
change as it moves, we can write: 



dE n 



which can be expanded to yield: 



dEdx dEdy dE n 

1 -| = 

dx dt dy dt dt ' 

or 

uE x + vE y + E t = 0, 
where E x , E y are the components of the brightness gradient, while E t is the 
time rate of change of brightness. This so-called brightness change constraint 
equation provides only one constraint on the two components of image flow, 
u and v. Thus image flow cannot be recovered locally without further infor- 
mation. 

Suppose now that the image flow components u and v are constant over a 
patch in the image. Then we can recover them using a least squares approach: 
We minimize the total error 

1=11 {uE x + vE y + E t ) 2 dx dy. 
Differentiation with respect to u and v leads to 

— = (uE x + vE y + E t )E x dxdy, 

-^=11 (uE x + vE y + E t ) E y dx dy. 
Setting these derivatives equal to zero, we obtain 

u ff E 2 x + v ff E x E y = - ff E X E U 
Jjd JJd JJd 

u Jh E » E * +v JJ D E2 >-JL E » E '- 

These are two linear equations that can be easily solved for u and v. 
Du= ff E 2 y ff E x E t - ff E x E y JJ EyE u 

Dv =JL E * E « IL E * Et - JL E2 * IL E > E " 



and 
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where D is the determinant of the coefficient matrix, that is, 



D= IL El IL E >-{IL E * E > 



2 



The coefficients are easily calculated in parallel, if so desired. 

While this closed form solution is very appealing in a sequential digital 
implementation, it involves division and other operations that are not par- 
ticularly easily carried out in analog circuitry. In this case, an iterative or 
feedback strategy may be favoured. Using a gradient descent approach, we 
arrive at 

-^ = -a / / (uE x + vE y + E t ) E x dx dy, 

A t t 

— = - a ( U E X + vE y + E t ) E y dx dy. 

At each picture cell, we estimate the derivatives of brightness, and compute 
the error in the brightness change onstraint equation 

e = (uE x + vE y + E t ) , 
using global buses whose potentials represent u and v. Currents proportional 
to —eE x and —eE y are injected into the buses for u and v respectively. This 
is essentially how the constant flow velocity chip of Tanner and Mead works 
[Tanner 86] [Tanner & Mead 87]. 

4.4. Special Purpose Direct Motion Vision Systems 

We have seen that in short-range motion vision one need not solve the cor- 
respondence problem. One can instead use derivatives of image brightness 
directly to estimate the motion of the camera. The time rate of change of 
image brightness at a particular picture cell can be predicted if the brightness 
gradient and the motion of the pattern in the image is known. This two- 
dimensional motion of patterns in the image, in turn, can be predicted if the 
three-dimensional motion of the camera is given. Given these facts, it should 
be apparent that the motion of the camera can be found by finding the motion 
that best predicts the time rate of change of brightness (t-derivative) at all 
picture cells, given the observed brightness gradients (x- and y-derivatives). 
Once the instantaneous rotational and translational motion of the camera 
have been found, one can determine the depth at points where the brightness 
gradient is large and oriented appropriately. 

As discussed above, several special situations have already been dealt 
with, including the case where the camera is known to be rotating only, the 
case where the camera is translating only, and the case of arbitrary motion 
where the surface being viewed is known to be planar. The solution in the 
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case of pure rotation is very robust against noise (since there are only three 
unknowns and thousands of constraints) and so well worth implementing. The 
solution in the case of arbitrary motion with respect to a planar surface is also 
quite robust, although it is subject to a two-way ambiguity. In this case there 
are eight unknowns (the rotational velocity, the translational velocity and the 
unit surface normal). The solution in the case of pure translation is more 
sensitive to noise (since there are about as many unknowns as constraints), 
but of great interest, since depth can be recovered. An elegant solution to the 
general case has not yet been found. It can, however, be expected that it will 
not be less robust than the pure translation case (since there are only three 
more unknowns). 

We will now describe in detail a method for the solution of the pure 
rotation case and a method for the solution of the pure translation case. We 
saw earlier that if the brightness of a patch does not change as it moves, we 
obtain the brightness change constraint equation 

uE x + vE y + E t = 0, 

where E x , E y are the components of the brightness gradient, while E t is the 
time rate of change of brightness. This equation provides one constraint on 
the image flow components u and v. Thus image flow cannot be recovered 
locally without additional constraint. 

We are now dealing, however, with rigid body motion, where image flow 
is heavily constrained. The image flow components u and v dependent on the 
instantaneous translational and rotational velocities of the camera, denoted 
t = (U, V, W) T and lj = (A, B, C) T respectively. It can be shown by dif- 
ferentiating the the equation for perspective projection [Longuett-Higgins &; 
Prazdny 80], that 

-U + xW A _ os „ 

u = - + Axy - B(l + x 2 ) + Cy, 

v= ~ V + yW +A(l+y 2 )-Bxy-Cx, 

where Z is the depth (distance along the optical axis) at the image point 
(x,y). combining this with the brightness change constraint equation, we 
obtain [Horn & Weldon 88] 

JE?«+vw+-s-t = 0, 

Zj 

where 

+E y + y(xE x +yE y ) 
v = ( -E x - x(xE x + yE y ) 
yE x — xE y 
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and 

/ -E x 

S = ~Ey 

\xE x +yE y 
This is called the rigid body brightness change constraint equation. 

4.5. Feedback Computation of Instantaneous Rotational Velocity 

Horn & Weldon [1988] rediscovered a method apparently first invented by 
Alomoinos & Brown [1985] for direct motion vision in the case of pure rotation. 
This method uses integrals of products of first partial derivatives of image 
brightness and image coordinates and involves the solution of a system of 
three linear equations in three unknowns. When there is no translational 
motion, the brightness change constraint equation becomes just 

E t + v • lj = 0. 

This suggests a least-squares approach, where we minimize 

+ v • a;) dx dy, 



-//»<* 



by suitable choice of the instantaneous rotational velocity lo. This leads to 
the simple equation 

( / / vyT dxdy) lj = — / / E t \ dx dy. 

This vector equation corresponds to three scalar equations in the three un- 
known components A, B, and C of the instantaneous rotational velocity vec- 
tor. The system of linear equations can be solved explicitly, but this involves 
division by the determinant of the coefficient matrix. When considering analog 
implementation, it is better to use a resistive network to solve the equations. 
Yet another attractive alternative is to use a feedback scheme (not unlike the 
one used to solve for the optical flow velocity components in the case when 
they are assumed to be constant over the image patch being considered). 

Finally, the solution can be obtained by walking down the gradient of the 
total error. The derivative with respect to u of the sum of squares of errors 
is just 

dl _ 

duo 
This suggest a feedback scheme described by the equation 

— = -a / / (E t + v • u;)v dx dy. 

The idea revolves around a bus, with potential on three wires proportional 
to the present estimates of the components A, B and C of the instantaneous 



/ / (E t 4- v • lj)v dx dy. 
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angular velocity lj. Estimates of the partial derivatives of image brightness 
(the components of the brightness gradient and the time rate of change of 
brightness) are computed at each picture cell. From them, and the position 
(x,y) of the cell, one can compute v. The coordinates x and y can be made 
available to each cell using resistive chains that are connected to fixed poten- 
tials on the sides of the chip. (It may be useful also to directly supply xy, 
(1 + x 2 ) and (1 + y 2 ), since these are coefficients in the expression for v). 
Next, one computes the error term 

e = E t + v • a;, 
which, in the absence of noise, is zero when the correct solution has been 
found. Currents are fed into the bus proportional to 

— ev = — (E t + v • u;)v. 
Each of the three bus wires is terminated in a capacitance 16 . We now have a 
system that obeys an equation like 

— = -a II (E t + v • w)v dx dy, 

the steady state solution of which is 



//. 



(E t + v • u;)v dx dy = 0, 
'D 
or 



(JL 



r T 




vv i dx dy I l> = — I E t v dx dy. 



The feedback scheme involves considerably less computation than the closed 
form solution (for example, we don't have to compute the 3x3 matrix vv T ). 
Also, the feedback scheme can be shown to be stable (as long as the integral 
of vv T is not singular, that is, as long as there is sufficient contrast in the 
image texture). 

The elementary components needed are the photo-sensors, differential 
buffer amplifiers that estimate spatial derivatives, approximate time delays 
for estimating the temporal derivative, four-quadrant analog multipliers, and 
current sources. There also will be resistive chains to supply values of x and 
y at each image location. 

4.6. Computation of Instantaneous Translational Velocity 

While the scheme described above for recovering the rotational velocity is very 
robust as shown both by sensitivity analysis and experimentation on comput- 
ers with both synthetic and real images, it is does not allow us to recover 



16 Unless there are instability problems due to unmodeled effects, one may be able 
to just rely on the parasitic capacitances. 
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depth. This is because there is no dependence of the brightness derivatives 
on depth when there is no translational motion. We now consider the other 
extreme, when there is only translational motion. 

When there is no rotational motion, the brightness change constraint 
equation becomes just 

£« + (s-t)i = 0. 

Note that multiplying both Z and t by a constant does not perturb the equal- 
ity. This tells us right away that there will be a scale factor ambiguity in 
recovering motion and depth. We take care of this by attempting only to 
recover the direction of motion. That is, we will treat t as a unit vector. 

We can solve the constraint equation above for the depth Z in terms of 
the unknown motion parameters. We obtain 

st 
E t 
If our estimate of the instantaneous translational motion t is incorrect, we will 
obviously obtain incorrect values for the depth from this equation. Some of 
these values may be negative (which correspond to points on objects behind 
the camera), while others will be unexpectedly large. Some methods have 
been explored that to find a direction of translational motion that yields the 
smallest number of negative depth values when applied to the image brightness 
gradients [Horn & Weldon 88]. Although these methods work, they have yet 
to show promise in terms of computational expediency. We consider another 
approach next. 

In many cases, particularly in industrial robotics, the depth range is 
bounded and the occurrence of very large depth values is not normally an- 
ticipated. One method for estimating the instantaneous translation velocity 
makes use of this observation 17 . We essentially look for a translational veloc- 
ity t that keeps Z small at most points in the image. Suppose, for example, 
that we find the translational velocity that minimizes 



= Jj ' Z 2 dxdy = JJ (^2-dxdy, 



subject to the constraint that t be a unit vector. We cannot measure bright- 
ness exactly, so there will be some error in our estimate of E t . To avoid 
problems due to noise in places where E t is almost zero, we may introduce an 
offset in the denominator as follows: 



/ = 



II w(E t )(s-t) 2 dxdy, 



The derivation of the method in terms of a minimization of the integral of Z 2 is 
merely an explanatory artifice. There is a way of arriving at the same result in a 
way does not appear to be this ad hoc [Horn & Weldon 88]. 
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where w(E t ) = l/(Ef + e 2 ). This integral can also be written in the form 

I = t T f ff w (E t ) ss T dx dy\ t = t T S t, 

where S is a 3 x 3 matrix. The expression for J is clearly a quadratic form in 
t. Given the constraint that t be a unit vector, such a quadratic form attains 
its minimum when t is the eigenvector of the matrix S corresponding to the 
smallest eigenvalue [Korn & Korn 68]. 

We explore in the next section how a circuit can be devised to compute 
this eigenvector. 

4.7. Finding Eigenvectors Using Analog Networks 

In one of the direct methods for recovering translational motion, the direction 
of motion is found to be the eigenvector of a symmetric 3x3 matrix S associ- 
ated with the smallest eigenvalue. The coefficients of the matrix are sums of 
products of image coordinates and first derivatives of image brightness. Note 
that the computation of the eigenvector needs to be done only in one place, 
using data accumulated over the whole image, rather than at each picture 
cell. It could potentially be done on a serial computer using the accumulated 
total obtained. An interesting question is whether this eigenvector can be 
found using an analog network, hopefully by means of a network that is not 
too complex. There are actually several ways of doing this. 

First of all, note that a dot-product can be computed using three multi- 
pliers, while a cross-product takes six. The product of a 3 x 3 matrix and an 
arbitrary vector requires nine multipliers. If we were looking for the eigenvec- 
tor associated with the largest eigenvector, we could use the observation that 
the iteration 

t n+1 = 5t n , 
converges to a multiple of this eigenvector given virtually any starting value 

v = a 1 e 1 + a 2 e 2 + a 2 e 3 , 
since 

S k (a 1 e 1 + a 2 e 2 + a 3 e 3 ) = A*ojiei + X 2 a 2 e 2 + \ 2 a 2 e 3 , 
where Ai, A 2 , and A3 are the eigenvalues and ei, e 2 , e 3 are the corresponding 
eigenvectors. As long as the eigenvalues are distinct, the term correspond- 
ing to the largest eigenvalue will dominate after a number of iterations (or 
equivalently, many time constants in a feedback implementation). 

For the results of such an iteration to remain within bounds, the result 
must be renormalized each time. Doing this the obvious way involves division, 
but a feedback circuit can achieve the same effect using only multiplication as 
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follows: (a) each of the components of the vector is multiplied by an adjustable 
positive scale factor, (b) the magnitude squared of the result is computed and 
(c) the scale factor is adjusted if the magnitude is not equal to one. The 
scale factor itself equals the inverse of the largest eigenvalue when the system 
stabilizes. Renormalization may be based on the maximum of the absolute 
values of the components instead of the sum of squares, if this turns out to be 
cheaper to compute. There may be some stability questions here, since the 
normalization is in essence trying to stabilize a positive feedback loop. 

We can get the eigenvector associated with the smallest eigenvalue if 
we apply the same idea to the inverse matrix S" 1 . Of course, inverting the 
matrix is in itself not trivial. But we can instead set up a network to solve 
the equation Sv = t for v given t, and use the iteration 

5t n+1 =t n , 
so that we do not have to explicitly invert the matrix. 

Another method is based on a different view of the value being minimized 

J = t T 5t. 
The gradient is just 

£ = 25t ' 

so we could consider adjusting the present guess for t in the direction of 

steepest descent. A problem with this is that we are dealing with a constrained 

minimization problem. Steepest descent will in fact just lead to the trivial 

solution t = 0. We have to maintain the condition that ||t|| 2 = 1. One way 

of doing this is to introduce a Lagrangian multiplier and add a term to the 

integral above: 

/' = t T St + A(t • t - 1). 

We can then take the derivatives with respect to t and A: 

dV 

— - = 25t + 2At, 

at 

dl' 

— = t • t — 1. 

d\ 

Unfortunately, the minimum of the original problem corresponds to a saddle 
point in this modified problem (where we have four instead of three unknown 
parameters). So descent along the gradient will not get us to the solution (but 
we could use the method of [Piatt h Barr 88]; see later). 

One way to cirucmvent this difficulty is to note that in this special case 
we can compute the value of A at the extremum: 

A = -t T 5t. 

This provides us with a way of estimating A that does not involve gradient 
descent. 
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Another approach is to remove the component of the gradient of I' in the 
direction of the gradient of the constraint function (t • t — 1). Such gradient 
projection methods lead to viable feedback schemes, as shown later. 

However, in this special case, we can do something simpler. We can 
normalize the integral by dividing by (t • t). This makes the result insensitive 
to changes in the magnitude of t. To then force the result to also satisfy 
the constraint, we add a term that penalizes departure from the condition 
(t-t-l) = 0: 

I" = -l-t T St + /i(t.t-l) 2 . 
The gradient now is 

it = w^r ((t ' t] st ~ (tTst) t} + Mt ' t ~ 1)t ' 

This suggests a feedback scheme like 

j- = -a((t • t) St - (t T St) t) - 0(t • t - l)t. 

The first term above equals 

-a((t x5t)xt), 

and so is orthogonal to t. We can conclude that the magnitude of t is not 
affected by adjustments resulting from this term. It is the second term, arising 
from the penalty function, that forces the magnitude of t to approach one as 
time increases. 

At this point we need to remember that 



and 



t T 5t= ff(s-t) 2 dxdy. 



So we have to estimate the brightness derivatives E x , E y and Et at every 
picture cell and compute s, the weighting factor w(Et), and the dot-products 
(s • t) and (t • t). We then generate currents proportional to 

-a w(E t ) (s • t)((t • t)s - (s • t)t) 
that are injected into a global three wire bus whose potentials represents the 
components of the translation vector t. The bus wires are terminated in 
capacitors connected to ground 18 . 

A separate circuit computes the error in the magnitude squared of t, and 
produces currents proportional to 

-/?(t-t-l)t, 



18 Again, the normal parasitic capacitances of the wires may be relied upon unless 
there are instability problems due to unmodeled effects. 
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that are injected into the capacitors whose potentials represent the compo- 
nents of t. 

If t minimizes the error integral, so does — t. One thus may arrive at one 
of two solutions depending on the initial conditions. Also, the adjustments are 
zero when t = 0. This equilibrium point at the origin is metastable, however, 
so any small disturbance will lead the system away. 

All of the above should, by the way, appear to the reader to be very 
familiar, since we used similar ideas in the previous section for finding the 
axis of least inertia of a binary image. The only real difference is that we were 
dealing there with a 2 x 2 matrix instead of a 3 x 3 matrix. 

The elementary components of the circuit needed are similar to the ones 
that are needed for the pure rotation case. One difference is that the vector s, 
is simpler to compute than v, the vector needed in the case of pure rotation. 
On the other hand, some nonlinear transfer function is needed to compute the 
weighting factor w(Et). 

4.8. General Motion with Respect to Planar Surface 

Another special case of considerable interest is that of general motion (trans- 
lation and rotation) with respect to a planar surface. This is the motion vision 
problem confronting a pilot landing an aircraft (although, of course, a pilot 
has many additional visual cues available). A planar surface can be described 
by an equation of the form 

n-R = 1, 

where n is a vector parallel to the normal with length equal to the inverse 
of the perpendicular distance of the plane from the origin. The perspective 
projection equation is 

- JL_ ?: 

r ~ R z " Z' 

where R = (X, Y, Z) T is the coordinate of a point in the scene, while r = 
(x,y, 1) T is the coordinate of the corresponding image point. In the case of 
a planar surface, Z = l/(n • r), so that the rigid body brightness change 
constraint equation, 



E t + v • u> + -s • t = 0, 



becomes 



E t + v • w + (n • r) (s • t) = 0. 
Note that the equality is not disturbed if we replace nbyfcn while at the same 
time replacing t by kt. This is just a reflection of the scale factor ambiguity 
already discussed above. We can allow for it by forcing t (or n) to be a 
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unit vector. It can also be shown that the equality is not disturbed if we 
exchange n and t and replace u with (w + nxt) [Horn &; Negahdaripour 87]. 
This possibly unexpected result shows that there will be a two-way ambiguity 
(unless n happens to be parallel to t). 

A least-squares problem suggested by the above analysis has us minimize 

/= // (E t +v-L> + {n-r)(s-t)) 2 dxdy, 

by suitable choice of a;, t, n. Adding a Lagrange multiplier to help enforce 
the condition t • t = 1, leads to 

V = // (E t + v • u> + (n • r)(s • t)) 2 dxdy + A(t • t - 1). 

Perhaps surprisingly, this problem has a closed-form solution [Horn & Ne- 
gahdaripour 87]. This solution is, however, complex, involving eigenvec- 
tor/eigenvalue decomposition of matrices and other operations. Iterative so- 
lutions that were discovered earlier suggest more reasonable parallel imple- 
mentations. We can, for example, once again consider the gradient of the 
total error. 

We note that the derivatives of the total error with respect tow, t, n, 
and A are 

du 



2 if (E t +\'LJ + (n'r)(s-t))vdxdy, 
— =2 // (E t + v - u + (n - r)(s • t)) (n • r)s dx dy + 2\t, 
= 2 // (E t +V'U + (n-r)(s't))(s-t)rdxdy, 
= (t-t-l), 



dn 

dl_ 

dX 

respectively. Once again, we find that the minimum of I corresponds to a 
saddle point of I', so that straightforward application of gradient descent will 
not provide us with the solution. One way to avoid this problem is to derive 
a closed-form expression for the Lagrange multiplier at the extremum and 
use this to estimate the multiplier during the iteration, rather than using 
gradient descent to adjust the estimate. Alternatively, one can use a gradient 
projection method to arrive at a viable scheme. 

One way of doing this is to make sure that the adjustments St in t are 
always orthogonal to t. This can be done by removing the component parallel 
to t from the gradient g as follows 

a = -A(g-( g -t)t), 
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where we have assumed that t really is a unit vector. If it is not, we can use 
instead 

a = -A((t.t) g -(g-t)t), 

This can also be written as a double cross-product: 

St = -A(g x t) x t. 
Alternatively, we can just adjust t in the direction of steepest descent and 
then renormalize the result. For small adjustments the two methods produce 
similar same result, and the renormalization method is simpler. We then have 

t n+l = t" - Ag" 

l|t n -Ag»||' 
By the way, which of the two possible solutions one arrives at will depend 
on the initial conditions. Fortunately, the functional relationship between the 
two solutions is known, so that one can easily compute one from the other. 
Note that the computation of the local departure from satisfaction of the 
brightness change constraint equation, 

£*+v-u> + (n-r)(s-t), 
need be done only once. Still, there are a considerable number of multiplica- 
tions at each picture cell, including those needed to compute s and v. The 
methods developed for the other special cases should help in the implementa- 
tion of this more ambitious one. 

4.9. Gradient Projection Method 

In the above we have designed circuits for solving constrained minimization 
problems. In several cases the form of the term to be minimized and the 
form of the constraint made it possible to arrive at a related unconstrained 
problem, simply by dividing by a suitable normalizing factor (and adding a 
suitable penalty factor). This cannot always be done. We now discuss in more 
detail a general method for dealing with these problems. 

When implementing algorithms using analog circuitry, it is often conve- 
nient to depend on something like steepest descent to solve a minimization 
problem, even when a closed form solution is available. The reason is that the 
closed form solution may involve operations (such as matrix inversion) that 
are difficult to implement directly in analog circuitry. An iterative scheme 
may take many steps to converge, but if the steps can be performed rapidly, 
this is not a concern. Similarly, a feedback scheme may take a time that is a 
large multiple of a basic time constant of the circuitry, but this is not an issue 
if that time constant is small enough. 

A difficulty arises, however, when the minimization problem happens to 
be constrained. The introduction of Lagrange multipliers, for example, yields 
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a search for stationary values in a larger space, but the extremum of the orig- 
inal constrained problem now corresponds to a saddle point, and thus can not 
be found using steepest descent. One possibility is to modify the equations by 
reversing the sign of the gradient component in the direction corresponding to 
the Lagrange multiplier and so obtain convergence, at least near the extremum 
of the original problem [Piatt & Barr 88]. Global convergence can often be 
restored by adding a penalty term that grows quadratical with distance from 
the constraint surface. This makes the overall scheme fairly complex. 

Let us look at the constrained problem in more detail now. Suppose that 
/(x) is to be minimized, but that the solutions are constrained to lie on the 
surface g(x) = 0. If we simply follow the gradient 

/ -£ 

Jx — j 1 

ax 
we will in general not preserve the condition that g(x) have a constant value. 
What we can do is to remove the component of the gradient of /(x) in the 
direction of the gradient of g(x) to obtain: 

r JX ' 9x 

Jx 9x- 

9x • 9x 

Moving in this direction, which is clearly orthogonal to g x , does not alter the 

value of g(x). We can, of course, us any convenient multiple of this vector to 

arrive at a feedback scheme. For example, we may wish to use 

dx 

— = -a((g x ■ g x )f x - (g x • f x )g x ). 

which can also be written in the form 

dx 

— = -a((g x xf x )xg x ). 

Clearly, (dx/dt) • g x = 0, so that g(x) remains constant if we follow this 
trajectory exactly. 

In practice, small errors will lead to departures from a particular surface 
g(x) = c, and furthermore, we may not start right away on the surface g(x) = 
0. So there is a need to introduce a force that pulls the solution towards this 
constraint surface. We could try to do this by adding a penalty function like 

^S'(x) 2 , 

to /(x), which would yield a gradient component 

2y,g{x)g x . 

This component is directly along the gradient of g(x) and so would be com- 
pletely removed by the gradient projection scheme discussed above. But there 
is no reason why we can not add such a term after the gradient projection. 
This leads to a feedback scheme for solving the constrained minimization 
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problem like 

dx 

An interesting question is whether this is the gradient of some function, per- 
haps some modification of /(x). In general it is not, that is, there is no 
function whose derivative with respect to x is given by this expression. 

Let us consider an example. Suppose we wish to find the minimum of 
Ax 2 +2Bxy + Cy 2 subject to x 2 + y 2 - 1 = 0. 
Here we have x = (x,y) T , so 

/ x = 2(Ax + By, Bx + Cy) T and g x = 2(x,y) T . 
Hence 

g x -g x = 4(x 2 + y 2 ) and g x • f x = 4(Ax 2 + 2B xy + Cy 2 ). 
Consequently 

((ffx • Sx)/x - (fa • /x)?x) = 8((C - A) xy + B (x 2 - y 2 )) (-y, +xf. 
So finally we arrive at a feedback scheme like 

— = -a((C -A) X y + B (x 2 - y 2 )) (-„, +xf - /3(x 2 + y 2 - l)(x, y) T . 

The first term drives the state along circular paths towards one of the two 
point where the circle intersects a line making an angle 6 with the ar-axis, 
where 

(C - A) sin 20 o = 2B cos 20 o • 
The second term drives the state radially towards the unit circle. 

We can easily generalize this example to finding the smallest eigenvalue 
of a symmetric matrix M (of arbitrary size). Here we have to minimize 

/(x) = x T Mx subject to #(x) = x • x — 1 = 0. 

We see that 

/x=2Mx and g x = 2x, 
and so 

f x -g x =4x T Mx and g x • g x = 4x • x. 

As a result we can use the feedback scheme 

dx 

— = -a((x • x) Mx - (x T Mx)x) - (3(x • x - l)x. 

The first term, orthogonal to x, drives the state along the surface of a sphere 
towards one of the two intersections of the sphere with a line through the 
origin in the direction of the sought after eigenvector. The second term drives 
the state radially towards the unit sphere. 

The generalization to more than one constraint should be obvious. We 
start by removing from the gradient of /(x) components in the directions of 



4. Short Range Motion Vision Methods 59 

the gradients of each of the constraint functions g(x)i. Then additional terms 
are added, proportional to each of the functions <7(x); in the direction of their 
gradients. 

Reversal of Gradient Component and Addition of Penalty Terms 

Piatt and Barr have suggested that a modification of gradient descent can be 
applied to the unconstrained problem obtained by introducing a Langrangian 
multiplier [Piatt &; Barr 88]. To mininize /(x) subject to <?(x) = 0, they 
suggest first constructing the function 

F(x, A; p) = /(x) + Aflf(x) + /V(x), 

where A is a Lagrangian multiplier (to be found), while fi is a parameter on 
the penalty term selected to assure global convergence. Typically, the method 
converges when started near the solution even when the penalty term is not 
added. This term is usually needed, however, to prevent divergence when 
at some distance from the solution. This suggests varying the parameter, 
according to a predetermined schedule, as the solution is approached. Now 

^x = /x + Xg x + 2/i#(x)# x , 

Fx = g(x). 

In many cases the feedback scheme 

dx 

~j[ = - a (/x + Aflfx) - ^(X)flfx, 

dX , x 

— = +a0(x), 

leads to the solution (note the positive sign in the second equation). 
Applied to the example above, we have 

F(x,y, A; //) = (Ax 2 + 2B xy + C y 2 ) + A(* 2 + y 2 - 1) + fi(x 2 + y 2 - l) 2 , 

so we obtain the set of equations 

dx 

— = -a((A + X)x + B y) - (3(x 2 + y 2 - l)x, 

^ = -a(Ax + (B + A)y) - (3(x 2 + y 2 - l)y, 
_ = +a ( T 2 + y 2 _ 1} 
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5. Summary and Conclusions 

A number of problems in early vision have been explored here and shown 
to lead to interesting analog networks. The focus was on implementations 
involving resistive networks, perhaps with capacitors and analog multipliers, 
as well as simple amplifiers. In several cases, feedback schemes where shown to 
be considerably simpler to implement than circuits based on the closed form 
solutions usually sought for in digital implementations. It was noted that 
simple feedback networks with local connections can invert local operations. 
This is of interest since the inverses of local operations typically are global, 
and direct implementation of these inverses would require unimplementably 
high wiring densities. 

A theorem giving an equivalence between two apparently quite different 
ways of using the same resistive network sometimes allows one to find a way of 
implementing a particular computation that is much simpler than the obvious 
direct implementation. The use of gradient projection was explored as a way 
of solving constrained minimization problems, although in several cases it was 
possible to avoid this added complication through judicious normalization of 
the terms to be minimized and addition of a penalty term. 

Using a spatial dimension to represent time in a partial differential equa- 
tion was shown to lead to new ways of implementing certain convolutional 
algorithms that would otherwise require a clocked architecture. In this alter- 
nate scheme, image data flows in continuously on one end, while processed 
information flows continuously out the other end. 

Also described here is a novel way of interlacing the nodes of a three- 
dimensional multi-resolution network in a two-dimensional tessellation. The 
number of nodes decreases from layer to layer by sub-sampling after low-passs 
filtering. Each layer contains half the number of nodes in its predecessor. 

It is clear that many early vision problems lend themselves to implemen- 
tation in parallel analog networks. This applies particularly to so-called direct 
methods, as opposed to feature-based methods, since the direct methods deal 
mostly with quantities connected to measurements at individual picture cells 
as well as their relationship to values at neighboring picture cells. Work on 
analog methods for early vision probably started more than twenty years ago, 
but was never very visible. It has now received a strong new impetus from the 
more general availability of facilities for integrated circuit design and fabrica- 
tion. This renewed interest is reflected in the pioneering work at Caltech in 
Carver Meads group [Mead 1989]. But no one should think that the methods 
explored there, or the ideas collected here, comprise anything more than an 
extremely sparse sampling of what is yet to come! 
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